The Visual Analogue Scale for Rating, Ranking and PairedComparison (VASRRP): A new technique for psychological measurement
Abstract
Traditionally, the visual analogue scale (VAS) has been proposed to overcome the limitations of ordinal measures from Likerttype scales. However, the function of VASs to overcome the limitations of response styles to Likerttype scales has not yet been addressed. Previous research using ranking and paired comparisons to compensate for the response styles of Likerttype scales has suffered from limitations, such as that the total score of ipsative measures is a constant that cannot be analyzed by means of many common statistical techniques. In this study we propose a new scale, called the Visual Analogue Scale for Rating, Ranking, and PairedComparison (VASRRP), which can be used to collect rating, ranking, and pairedcomparison data simultaneously, while avoiding the limitations of each of these data collection methods. The characteristics, use, and analytic method of VASRRPs, as well as how they overcome the disadvantages of Likerttype scales, ranking, and VASs, are discussed. On the basis of analyses of simulated and empirical data, this study showed that VASRRPs improved reliability, response style bias, and parameter recovery. Finally, we have also designed a VASRRP Generator for researchers’ construction and administration of their own VASRRPs.
Keywords
Likerttype scale Paired comparison Ranking Multiitem VAS VASRRP CTCU modelLikerttype scales are one of the most popular rating scales used in surveys to measure respondents’ traits. They typically have three or more response categories to choose from, and respondents select the category that reflects their state and trait best (Likert, 1932). However, Likerttype scales have some inherent disadvantages, such as response styles, the fact that they produce ordinal measurement data, and ambiguous numbers of response categories, which prevent the accurate identification of respondents’ latent traits, and also adversely affect the use of statistical analysis methods and subsequent results (Allen & Seaman, 2007). Response styles are the systematic tendencies of respondents in their choices of certain response options (Paulhus, 1981). For example, respondents are inclined to select either neutral or middle response categories (Albaum, 1997; Greenleaf, 1992) or to provide extreme responses (Greenleaf, 1992). These response styles will lead to biased answers, which prevent the respondents’ true characteristics or traits from being obtained (Paulhus, 1981, 1991).
The psychometric property of Likerttype scales is another issue. Likerttype scales are an ordinallevel measure but not an intervallevel measure—that is, the response categories have a rank order, but the intervals between values cannot be presumed to be equal (Jamieson, 2004). Ordinal data are usually described using frequencies of responses in each category, and thus the appropriate inferential statistics for ordinal data are those employing nonparametric methods, but not parametric methods, which require interval data (Allen & Seaman, 2007; Bollen, 1989; Jamieson, 2004). Many researchers ignore the problems of Likerttype scales all together and avoid mentioning them, such as by treating their ordinal data as interval and summing up the subscales (Tabachnick & Fidell, 2001). However, using ordinal data with statistical procedures requiring intervalscale measurements causes problems. For example, Bollen and Barb (1981) showed that estimates of the Pearson correlation coefficient are underestimated when computed for ordinal data. Babakus, Ferguson, and Jöreskog (1987) found that using ordinal data generally led to underestimating the factor loadings and overestimating their standard errors. Specifically, the biases induced by using various amounts of ordinal data points to calculate means, covariance, correlations, and reliability coefficients were derived by Krieg (1999), and he concluded that the more points the better, with a continuous scale being the optimal choice.
Furthermore, researchers hold a wide variety of views on how to determine the appropriate number of response categories for Likerttype scales to use in measurement (Alwin, 1992; Cox, 1980; McKelvie, 1978; Preston & Colman, 2000; Viswanathan, Bergen, Dutta, & Childers, 1996). Alwin (1992) argued that scales with more response categories are more reliable and more valid. Using only a few response categories restricts respondents’ ability to precisely convey how they feel (Viswanathan et al., 1996). In contrast, McKelvie (1978) pointed out that a relatively small number of response categories (five or six) should be used for ease of coding and scoring, and such a format will not significantly reduce reliability. Besides, both Ferrando (2003) and Scherpenzeel and Saris (1997) suggested that the number of response categories used by respondents depended on many factors, such as the type of scale, and respondents’ motivational and cognitive characteristics. These studies with ambiguous or conflicting conclusions make selecting an appropriate number of response categories quite an ordeal. In fact, there may be no optimal number of response alternatives, because regardless of the amount the researcher will still encounter serious issues.
For those who do not wish to ignore the problems inherent to Likerttype scales, there are several approaches to improving their use. The first approach involves using different data collection procedures or different scale formats to measure the respondents’ traits. For example, a comparison, or ipsative, method was proposed to reduce responsestyle biases because in comparison methods respondents cannot endorse every item, and consequently may eliminate uniform biases such as acquiescent responding (Cheung & Chan, 2002; Cunningham, Cunningham, & Green, 1977; Greenleaf, 1992). Meanwhile, visual analogue scales (VAS) are scales developed to obtain measurements with more variability, and use a line continuum instead of the five or seven categories used by Likerttype scales to measure latent traits (Flynn, van Schaik, & van Wersch, 2004; Guyatt, Townsend, Berman, & Keller, 1987; Jaeschke, Singer, & Guyatt, 1990). Researchers claimed that allowing participants to place their responses anywhere on a continuous line not only makes VAS free from the problem of determining the number of response categories, but also produces continuous and intervallevel measurement data (e.g., Myles, Troedel, Boquest, & Reeves, 1999; Price, McGrath, Rafii, & Buckingham, 1983; Reips & Funke, 2008). The third approach involves using mathematical transformation methods to rescale ordinal data into interval data and remedy the psychometric issue of Likerttype scales. After transformation, ordinal Likert data were able to be used in the application of suitable statistical techniques for further analysis (Chimi & Russell, 2009; Cook, Heath, Thompson, & Thompson, 2001; GranbergRademacker, 2010; Harwell & Gatti, 2001).
Nevertheless, although the aforementioned approaches have overcome parts of the disadvantages of Likert scales, they all introduced their own problems (see the next section). The most ideal method, thus, may be to use a scale that is able to collect finegrained data, and is also able to avoid measurement errors and additional transformation processes, and forestall the potential problems with absolute judgments. Moreover, the new scale should be equipped with a comparison function to reduce responsestyle biases. Based on these ideas, the first purpose of this study is to propose the Visual Analogue Scale for Rating, Ranking, and PairedComparison (VASRRP) for data collection, to ameliorate the measurement quality of ranking, paired comparison, and Likerttype scales through use of multiitem VAS (see The VASRRP section). The second purpose of the study is to empirically evaluate the reliability, and parameter recovery of the VASRRP through simulation and empirical studies.
Literature review
The comparison method approach to improving the Likerttype scale
Many other methods have been proposed to tackle the disadvantages of Likerttype scales. The first and most commonly used method is adopting a forcedchoice method, such as ranking or paired comparison, to reduce responsestyle bias. The method of ranking is based on how a respondent ranks multiple items according to a certain criterion or quality. Consider the ranking of personal preferences as an example. The respondent could rank four different items {A, B, C, D} in a single list from the most to the least favorite in the following order: B, C, A, and D. Paired comparison, on the other hand, would group the items in pairs for the comparison: in this case the four items {A, B, C, D} would be grouped as {A, B},{A, C}, {A, D}, {B, C}, {B, D}, and {C, D}. The respondent is then asked to compare each pair separately in terms of personal preferences. Many studies have pointed out that using ranking or paired comparison can effectively resolve the response style problem of Likerttype scales because comparison methods do not allow the endorsement of every item, and thus eliminate uniform biases such as acquiescent responding (Baron, 1996; Cunningham et al., 1977; Greenleaf, 1992; Randall & Fernandes, 1991). Ranking and paired comparison have been adopted by numerous scales and inventories, such as the Gordon Personal Profile Inventory (Gordon, 1993), the Minnesota Importance Questionnaire (Gay, Weiss, Hendel, Dawis, & Lofquist, 1971), the O* NET Computerized Work Importance Profiler (McCloy et al., 1999a), and the Kolb Learning Style Inventory (Kolb, 2005).
Although ranking and paired comparison may reduce the responsestyle bias associated with Likerttype scales, they have their own problems. As the number of items increases, paired comparison becomes extremely timeconsuming and laborious for participants (Rounds, Miller, & Dawis, 1978). The number of judgments increases very rapidly as the number of items increases. From a mathematical point of view, paired comparison and ranking are ipsative measures, and this creates analytical problems or problems related to interpretation (Hicks, 1970; Meade, 2004). For example, the mean, standard deviation, and correlation coefficient of an ipsative measure cannot be used for comparison or interpretation purposes because these values merely represent the ranking of the variables. Moreover, because the sum of the item scores is a constant, as each of the rows and columns of a covariance matrix sums to zero, the covariance matrix is singular, and hence does not have an inverse matrix. This means that many statistical methods (e.g., factor analysis) that use covariance matrices for analysis become inapplicable. Also, when the sum is a constant, it turns the positive correlation between some variables into a negative correlation (Clemans, 1966; Dunlap & Cornwell, 1994).
There have been many attempts to solve the problems associated with ipsative measures. Jackson and Alwin (1980) suggested a way to transform ipsative measures, based on the assumption that an ipsative measure is obtained by subtracting the mean from the original data values. However, not all ipsative measures are obtained this way; for example, ranking involves comparing items instead of subtracting the mean, and hence the method suggested by Jackson and Alwin only works for certain types of ipsative measures. Other attempted solutions include Chan and Bentler (1998), who proposed a method based on covariance structure analysis for ranking data, and Brown and MaydeuOlivares (2011, 2013), who reparameterized the secondorder Thurstonian factor model into a firstorder factor model and proposed using the Thurstonian IRT model to analyze ranking and paired comparison data. However, the statistical techniques of Brown and MaydeuOlivares (2011, 2013) are subject to limitations in practice. For example, their algorithms cannot handle inventories that include a larger number of items (e.g., 23 blocks with 138 items) at the same time, because large quantities of items cause huge numbers of comparisons and even more estimated parameters, which surpass the handling capacity of their algorithms.
Using visual analogue scales to improve the Likerttype scale
Another major issue with the use of Likerttype scales is the ambiguous number of response categories. One commonly used method for avoiding this disadvantage is using a VAS (Flynn et al., 2004; Guyatt et al. 1987). A VAS is typically presented as a horizontal line, anchored with two verbal descriptors at the extremes where respondents indicate their perceived status by placing a mark along the horizontal line at the most appropriate point (Wewers & Lowe, 1990). VASs are easy to understand, administer, and score, especially when the VAS is implemented with a computer (Couper, Tourangeau, Conrad, & Singer, 2006; Wewers & Lowe, 1990; Yusoff & Janor, 2014). There are several important psychometric features of a VAS. First, the line continuum of a VAS enables the rater to make more finegrained (Chimi & Russell, 2009) responses without the constraints of direct quantitative terms (Wewers & Lowe, 1990), and thus measurement data with higher variability will be obtained, which theoretically enhances their reliability (Cook et al., 2001; Krieg, 1999). This resolves the drawback of Likerttype scales, which have coarsegrained discrete measurement data produced by only three to seven categories. Second, VAS may provide intervallevel measurements that are eligible for more statistical operations. The intervallevel scale can be defined as a numeric scale on which people may assign numbers to objects in such a way that numerically equal distances on the scale represent equal distances between the features/characteristics of the objects being measured. Researchers have provided evidence for the intervallevel measurement of VAS (e.g., Price, McGrath, Rafii, & Buckingham, 1983). Recently, Reips and Funke (2008) designed experiments based on judgments of equal intervals in psychophysics (Stevens, 1946, 1951) and provided evidence that participants’ responses to a VAS possess the property of an intervallevel scale. Third, because of the high variability of a VAS, researchers and practitioners need not bother to determine the number of response categories (Flynn et al., 2004; Funke & Reips, 2012; Guyatt et al., 1987; Jaeschke et al., 1990; Kuhlmann, Dantlgraber, & Reips, 2017).
Despite the advantages mentioned above, several features of VASs need to be investigated. For example, whether the reliability and validity of VASs outperform those of Likerttype scales remains controversial, especially when different delivering tools are involved (e.g., computerbased vs. paperandpencil; Couper et al., 2006; Kuhlmann et al., 2017; Wewers & Lowe, 1990). Furthermore, most VASs have been administered in the format of a single item coupled with a single question; that is, each item was composed of a target attribute (or trait, statement, description, question, etc.) to be rated, along with the line continuum. This may result in absolute judgments along the continuous scale, and thus unsatisfactory reliability (e.g., Ferrando, 2003; Munshi, 2014). Both psychologists and psychometricians (e.g., Laming, 2004; Nunnally, 1967) have proposed that humans are much better at making comparative judgments than at making absolute judgments. Since multiple attributes can be located on the line continuum of a VAS simultaneously, for both ranking and paired comparison, the feasibility and psychometrical properties of using a VAS for ranking and paired comparison are worthy of investigation, especially because doing so would effectively duplicate all of the functionalities present in Likerttype scales.
Using transformations to address issues with Likerttype scales
To overcome the psychometric issues of Likerttype scales, several researchers (e.g., GranbergRademacker, 2010; Harwell & Gatti, 2001; Wu, 2007) have proposed transformation methods to scale ordinal Likerttype data before statistical estimation or hypothesis testing. These methods utilize different mathematical models and mechanisms to rescale ordinal Likerttype data to interval data. For example, Harwell and Gatti applied item response theory (IRT) to model the discrete total scores obtained by testtakers to an intervalscaled proficiency. They argued that a nonlinear transformation of the IRT method would produce data that are not only intervalscale measures but also approximately normally distributed and suitable for statistical procedures. Additionally, recently GranbergRademacker proposed a Markov chain Monte Carlo scaling modeling technique method that converts ordinal measurements to interval. Finally, Wu applied Snell’s method to transfer 4 and 5point Likerttype scales to numerical scores. Snell’s method assumes an underlying continuous scale of measurement and that the underlying continuous distributions follow a logistic function. Wu argued that the transformed data better followed the assumption of normality.
However, even researchers adopting such transformation approaches have acknowledged the complexity and difficulty of their transforming operations (e.g., Harwell & Gatti, 2001; Wu, 2007); because these procedures require extensive mathematical and statistical professional knowledge, the transformations are complicated to handle for people without a background in statistics or psychometrics. Moreover, mathematical models with many additional assumptions are required when applying the transformations. Those different mechanisms underlying the mathematical models make it difficult to evaluate the accuracy of the data after the transformation (Yusoff & Janor, 2014). In addition, the improvement offered by such transformations is uncertain; for instance, many indices of factor analysis have not demonstrated much difference between Likerttype scales and transformed Likerttype scales (Wu, 2007).
The VASRRP
Components of the VASRRPs and their usage
While using a VASRRP, if there is only a single item in a testlet, the respondent first checks the item and then indicates its appropriate position on the line continuum by dragging and dropping the item onto the scale, which is similar to the response to an item on a typical VAS. If there are multiple items in a testlet, respondents can repeat the procedure described above for a single item several times, until all the items in the testlet are located on the line continuum. During the process, respondents are allowed to move any item freely on the line and to do plenty of comparisons, until the relative positions of all items on the line match up to the respondent’s opinions. Meanwhile, different items in the testlet are not allowed to be marked at the same point, which assures that the VASRRP can be used as a comparison method. The scores of each item are calculated on the basis of the coordinates on the line continuum, which are represented by the pixels on the computer screen. Specifically, if x_{1} and x_{2} represent the two endpoint coordinates on the continuum, and the respondent indicates an item as x_{3}, the score is calculated as \( \frac{x_3{x}_1}{x_2{x}_1} \) for the item, which ranges from 0 to 1, indicating the level of intensity or strength of the item. Note that linear transformations can be used. For example, scores can be adjusted to fall within the range of [0, 100], or moved horizontally to an interval with 0 as the midpoint, such as [–1, 1]. Because the value for a participant’s response can be any number within the chosen range, a VASRRP, like a VAS, can be considered a very finegrained scale.
Figure 1 shows two example VASRRP scales. Figure 1a is a testlet with two items. The respondent compares the two items on the basis of their perceived importance, and then indicates the items on the continuum. In addition, the midpoint of the continuum helps the respondent differentiate whether or not an item is considered important. In Fig. 1a, the respondent indicated that one item is important and the other is not. Figure 1b is a testlet that has four items (A, B, C, and D) representing four different styles of learning. The respondent has rated how similar each of the learning styles is to his or her own personal learning. The respondent lists the styles as A, C, B, and D, in order of decreasing similarity with his/her own ways of learning. The figure shows that the respondent considers A and C to be quite similar to his/her learning style, whereas B and D are not. Note that the respondent’s indication of B is closest to the midpoint of the continuum. The figure also shows that the respondent considers the difference between A and C to be slight, and the differences between B and C and between B and D to be larger.
Features of the VASRRP
 1.
Similar to the response format of the VAS, the VASRRP can elicit respondents’ finegrained responses on a line continuum.
 2.
In the response format of a VASRRP with multiple items in each testlet, respondents can implement comparative judgments for the items in each testlet. Compared with the criticized “absolute judgment” function of a singleitem VAS (Goffin & Olson, 2011) and Likerttype scales (Sheppard, Goffin, Lewis, & Olson, 2011), the comparative judgment function of VASRRPs not only provides respondents with a more authentic measurement tool for human judgments (Laming, 2004; Nunnally, 1967) but also realizes the ideal of collecting more diverse types of data, such as rating, ranking and paired comparison, in a single operation.
 3.
Although VASRRPs can be implemented in a context of comparisons, the total score of the summed items is not a constant, which is different from the traditional ipsative scales with the same total summed scores (i.e., a constant). Thus, many statistical procedures that cannot be administered to ipsative data can be applied to VASRRPproduced measurements. Furthermore, as compared with ranking or paired comparisons, which may only produce qualitatively different information among items (e.g., A > B > C) after certain transformation methods (e.g., GranbergRademacker, 2010; Harwell & Gatti, 2001; Wu, 2007), VASRRPs can not only provide this qualitative information, but also quantify the degree of difference among those items, because the position of each item on the line continuum is clearly indicated and on the same spectrum. This quantitative information will not only help researchers find out the exact differences among ranked items, but also help clearly identify the inclination of a participant’s attitude (e.g., positive or negative, like or dislike, important or unimportant), which can be shown by observing if the averaged scale score is above or below the midpoint. Such clarification is important for scales such as work values or career interest; however, it cannot be achieved through ranking or paired comparison, because those methods do not have a reference point for comparisons (McCloy et al., 1999a).
 4.
Other types of scales can be viewed as special cases of the VASRRP. For example, if the VASRRP has only one item in each testlet, the VASRRP can be used as a graphic rating scale or a VAS; this format of VASRRP can also be used as a Likerttype scale by assigning categories (e.g., five or seven terms for describing the intensity) to the line continuum for responses and calculating the scores. For the format of a VASRRP with two or more items, the VASRRP can function as a ranking or pairedcomparison task, because the ordering positions of all those items on the line continuum reveal information about ranks, and the relative positions of each item reveal information about paired comparisons. Moreover, using a VASRRP for implementing pairedcomparison tasks reduces the load for respondents, in contrast to the traditional pairedcomparison task, in which \( \left(\genfrac{}{}{0pt}{}{n}{2}\right) \) numbers of item comparisons are needed. With VASRRP the respondent only needs to read the items on a testlet and consider their relative positions on the line continuum, which saves time and energy.
Analysis of VASRRP
Specifically, in data from VASRRPs with multiple items, the scores of each item will be affected by three factors: latent variables, measurement error, and the context effects of comparisons, which are the mutual influences of the items in the same testlet. Although the design of the testlets will help respondents make comparative judgments and might avoid responsestyle biases, it is noteworthy that when the procedure of model fitting is applied, the context effect within a testlet may reduce the accuracy of the parameter estimations (Holyk, 2008). However, we can take context effects into account in statistical analyses in order to obtain more accurate results. For example, the correlatedtraits–correlateduniqueness model (CTCU model; Marsh, 1989; Marsh & Bailey, 1991) is one of the statistical models that can be applied to take the contextual factors into account.
The CTCU model, developed for confirmatory factor analysis (CFA), has been primarily used for multitrait–multimethod (MTMM) data processing (Marsh & Bailey, 1991). It sets correlated trait factors, whereby method effects are inferred from correlations of the error terms (Tomás, Oliver, & Hontangas, 2002). As compared with the traitonly model (the CT model), which posits trait factors but no method effects, the CTCU model infers the method effects from the correlated uniqueness among the measured variables on the basis of the same methods (Marsh & Grayson, 1995). Adopting the idea from CTCU, in the present study we inferred the item score correlations and context effects that resulted from interitem comparisons in the same testlet from the correlations of measurement errors. Another reason for applying the CTCU model is that incorrect solutions are less likely to occur during the analysis process of model fitting (Marsh, 1989; Tomás et al., 2002), such as when the variance is < 0 or the correlation is > 1 or < – 1. The software LISREL or Mplus can be utilized directly to estimate parameters or evaluate the goodness of fit of the model.
To sum up, on the basis of the data features of the VASRRP described above, there are three approaches to analyzing VASRRP data: The first one is to use an IRT model or factor analysis to rescale the VASRRP data, and then apply statistical procedures to analyze these scaled data. Alternatively, since the VASRRP elicits respondents’ finegrained responses on a line continuum, and the estimators obtained from finegrained data will be less biased than those derived from Likert scale and ranking (Bollen & Barb, 1981; Krieg, 1999), statistical procedures such as the t test, F test, and analysis of variance, or descriptive statistics such as the mean, standard deviation, and correlation coefficient of a VASRRP, could be applied. Moreover, VASRRPs can be used to investigate the relationships among unobservable latent constructs and measured variables, such as through CFA or structural equation modeling (SEM), which may not be eligible for use with ranked data sets.
Simulation and empirical studies of the VASRRP
To demonstrate the advantages of using VASRRPs, two simulations were first performed in this study: In Simulation 1 we compared VASRRPs with Likerttype scales, and in Simulation 2 we compared VASRRPs with ranking, in terms of both parameter recovery and model fit. Next, we also performed an analytical comparison of empirical data from the SituationBased Career Interest Assessment (SCIA; Sung, Cheng, & Hsueh, 2017; Sung, Cheng, & Wu, 2016) and evaluated the efficacy of the VASRRP. Two sets of empirical data obtained using the VASRRP and Likerttype scales were then analyzed to demonstrate the differences between these scales.
Simulation Study 1: VASRRPs versus Likerttype scales
Likerttype scales are widely criticized because they use only a small number of response categories for the measurement of latent variables. When the latent variables are finegrained data, the use of Likerttype scales results in measurement errors. In Simulation 1 we examined the extent to which model fit and parameter recovery are affected by such errors.
Methods
Data of simulation
Two types of simulated data were used that were based on the research objectives of this study: those with and without the context of comparison effects. The first type of data was generated by the CTCU model to simulate a testlet comprising data with the context effects, whereas the second type of data was generated by the correlatedtrait model (CT model). Correlated error terms in the CTCU model can represent the context effects (as described in the previous section about the analysis of the VASRRP, as well as shown in Fig. 2, which used three latent variables as an example; however, in this simulation we used four latent variables instead of three), meanwhile the error terms of the CT model are not correlated, so the CT model can simulate data that do not exhibit the context effects. While generating the two types of data, we also applied different models for the analysis. There were four latent variables, each of which had either four or eight items. Following the empirical studies of Sung, Cheng, and Wu (2016) and the simulation settings of Brown and MaydeuOlivares (2012), factor loadings among the latent variables and the items were set to range from 0.60 to 1.20 in each simulation. The coefficient for the correlation among the latent variables was .1 or .3, with the correlation being stronger between two adjacent variables (Holland, 1997). In the CTCU model, the correlation coefficient for the error terms was also set as .1 or .3, with the correlation again being stronger between two adjacent variables.
The CTCU model can generate VASRRP data ranging from 0 to 1, whereas the CT model generates continuous data that, through the use of cut points, can be transformed into Likertscale data (Krieg, 1999; Nyren et al., 1987). The simulated data were generated using Mplus, with the default data based on a standard normal distribution and within a range of [– 3, 3]. In the process of simulating Likerttype scales, we used {– 2, 2}, {– 1, 1}, {– 3, – 1, 1, 3}, and {– 1.5, – 0.5, 0.5, 1.5} as the cut points to represent two types of 3point Likerttype scales and two types of 5point Likerttype scales. Note that Likerttype scales with an identical number of response categories that are cut at different values can be used to mimic different category descriptions.
In all simulation scenarios the sample size was 500, and each simulation was run 500 times. For convenience, in the CT model, we use “xLyI” to represent a model that has x latent variables, with each latent variable containing y items. In the CTCU model, we use xLyI to represent a model that has x latent variables, with y testlets and each latent variable containing y items. For the Likerttype scales, 4L8I and 4L4I represent models that have four latent variables, with each latent variable containing eight and four items, respectively. For VASRRP scales, 4L8I and 4L4I represent models that have four latent variables with eight or four testlets, each containing four items.
Analysis
This study used Mplus 7.0 for further analysis because it provides rapid data simulation. However, this version is not equipped with the principal component method to estimate factor loadings, which is the only method that does not require the covariance matrix to be nonsingular. Therefore, comparisons in Simulation 1 do not include the use of ranking scales. The description of Simulation 2 provides a comparison between a ranking scale and a VASRRP.
Results
Model fit
Reliabilities of different scales
Model  Scale  Cut Points  Cronbach’s Alpha  

Latent Variable 1 (SE)  Latent Variable 2 (SE)  Latent Variable 3 (SE)  Latent Variable 4 (SE)  
4L4I  VASRRP scale  .713 (.022)  .796 (.015)  .736 (.020)  .811 (.014)  
Likerttype scales  {– 2, 2}  .482 (.046)  .612 (.035)  .518 (.044)  .638 (.031)  
{– 1, 1}  .637 (.027)  .728 (.020)  .663 (.025)  .746 (.019)  
{– 3, – 1,1,3}  .651 (.026)  .745 (.018)  .677 (.025)  .763 (.018)  
{– 1.5, – 0.5, 0.5, 1.5}  .685 (.024)  .770 (.016)  .709 (.022)  .785 (.016)  
4L8I  VASRRP scale  .820 (.011)  .873 (.008)  .836 (.011)  .882 (.007)  
Likerttype scales  {– 3, – 1, 1,3}  .778 (.014)  .842 (.010)  .796 (.014)  .853 (.009)  
{– 1.5, – 0.5, 0.5, 1.5}  .802 (.012)  .857 (.008)  .817 (.012)  .866 (.008) 
Model fit indices of the scales
Model  Scale  Cut points  Model Fit Indices  

RMSEA  SRMR  CFI  TLI  χ ^{2}  df  
4L4I  VASRRP scale  .008 (.009)  .027 (.003)  .998 (.003)  .999 (.007)  75.56 (12.365)  74  
Likerttype scales  {– 2, 2}  .010 (.009)  .033 (.003)  .987 (.015)  .991 (.026)  103.276 (14.972)  98  
{– 1, 1}  .008 (.009)  .030 (.003)  .995 (.007)  .999 (.013)  99.374 (14.924)  98  
{– 3, – 1, 1, 3}  .007 (.009)  .030 (.003)  .996 (.006)  .999 (.012)  98.783 (14.345)  98  
{– 1.5, – 0.5, 0.5, 1.5}  .008 (.009)  .029 (.003)  .996 (.005)  .998 (.010)  100.686 (14.494)  98  
4L8I  VASRRP scale  .006 (.006)  .032 (.002)  .998 (.003)  .998 (.005)  421.251 (28.093)  410  
Likerttype scales  {– 3, – 1, 1, 3}  .006 (.006)  .034 (.002)  .996 (.005)  .997 (.007)  468.645 (29.420)  458  
{– 1.5, – 0.5, 0.5, 1.5}  .007 (.006)  .033 (.002)  .996 (.004)  .997 (.007)  471.686 (30.333)  458 
Composite reliabilities of the different scales
Model  Cut Points  Composite Reliability  

Latent Variable 1  Latent Variable 2  Latent Variable 3  Latent Variable 4  
4L4I  VASRRP scale  .718  .799  .741  .815  
Likerttype scales  {– 2, 2}  .495  .620  .529  .646  
{– 1, 1}  .642  .731  .667  .748  
{– 3, – 1, 1, 3}  .657  .749  .683  .767  
{– 1.5, – 0.5, 0.5, 1.5}  .689  .772  .713  .787  
4L8I  VASRRP scale  .815  .889  .851  .898  
Likerttype scales  {– 3, – 1, 1, 3}  .774  .856  .811  .868  
{– 1.5, – 0.5, 0.5, 1.5}  .798  .871  .811  .881 
Parameter recovery
Parameter recoveries obtained by the VASRRP scale and Likert scales with different cut points in the 4L4I model
True Value  VASRRP Scale  LikertType Scale With Cut Points of {– 2, 2}  LikertType Scale With Cut Points of{– 1, 1}  LikertType Scale With Cut Points of {– 3, – 1, 1, 3}  LikertType Scale With Cut Points of {– 1.5, – 0.5, 0.5, 1.5} 

Factor loading: (λ1, λ2, λ3, λ4) = (0.65,0.75,0.85,0.95) (λ5, λ6, λ7, λ8) = (1.15,1.05,0.95,0.85) (λ9, λ10, λ11, λ12) = (0.70,0.80,0.90,1.00) (λ13, λ14, λ15, λ16) = (1.20,1.10,1.00,0.90)  Estimates (Mean & SE): (.651, .746, .851, .946) (.058, .054, .062, .064) (1.149, 1.047, .948, .847) (.062, .058, .057, .057) (.695, .796, .892, .998) (.057, .059, .064, .063) (1.194, 1.095, .998, .900) (.061, .062, .060, .058)  Estimates (Mean & SE): (.109, .135, .168, .196) (.027, .029, .032, .037) (.261, .229, .197, .166) (.030, .028, .027, .027) (.120, .149, .179, .212) (.032, .034, .037, .045) (.275, .244, .212, .183) (.029, .029, .029, .027)  Estimates (Mean & SE): (.307, .345, .389, .422) (.034, .031, .034, .034) (.488, .457, .424, .387) (.030, .029, .031, .029) (.326, .367, .404, .439) (.033, .033, .033, .035) (.504, .471, .441, .406) (.030, .030, .031, .030)  Estimates (Mean & SE): (.325, .372, .426, .471) (.037, .034, .039, .039) (.571, .521, .474, .423) (.036, .033, .035, .033) (.384, .397, .447, .497) (.035, .037, .038, .040) (.594, .544, .499, .449) (.037, .035, .034, .034)  Estimates (Mean & SE): (.598, .671, .752, .819) (.056, .054, .057, .055) (.947, .887, .822, .750) (.051, .048, .051, .051) (.630, .714, .783, .855) (.055, .054, .056, .055) (.975, .916, .855, .787) (.050, .050, .053, .051) 
Correlation matrix of latent variables: \( \left[\begin{array}{cc}\begin{array}{cc}& .3\\ {}& \end{array}& \begin{array}{cc}.1& .3\\ {}.3& .1\end{array}\\ {}\begin{array}{cc}& \\ {}& \end{array}& \begin{array}{cc}& .3\\ {}& \end{array}\end{array}\right] \)  Estimates: \( \left[\begin{array}{cc}\begin{array}{cc}& .299\\ {}.\mathbf{055}& \end{array}& \begin{array}{cc}.095& .299\\ {}.297& .099\end{array}\\ {}\begin{array}{cc}.\mathbf{065}& .\mathbf{053}\\ {}.\mathbf{054}& .\mathbf{056}\end{array}& \begin{array}{cc}& .299\\ {}.\mathbf{055}& \end{array}\end{array}\right] \)  Estimates: \( \left[\begin{array}{cc}\begin{array}{cc}& .288\\ {}.\mathbf{088}& \end{array}& \begin{array}{cc}.088& .282\\ {}.281& .098\end{array}\\ {}\begin{array}{cc}.\mathbf{091}& .\mathbf{089}\\ {}.\mathbf{085}& .\mathbf{071}\end{array}& \begin{array}{cc}& .283\\ {}.\mathbf{091}& \end{array}\end{array}\right] \)  Estimates: \( \left[\begin{array}{cc}\begin{array}{cc}& .297\\ {}.\mathbf{063}& \end{array}& \begin{array}{cc}.096& .296\\ {}.295& .098\end{array}\\ {}\begin{array}{cc}.\mathbf{071}& .\mathbf{062}\\ {}.\mathbf{062}& .\mathbf{062}\end{array}& \begin{array}{cc}& .296\\ {}.\mathbf{064}& \end{array}\end{array}\right] \)  Estimates: \( \left[\begin{array}{cc}\begin{array}{cc}& .299\\ {}.\mathbf{061}& \end{array}& \begin{array}{cc}.096& .300\\ {}.298& .100\end{array}\\ {}\begin{array}{cc}.\mathbf{069}& .\mathbf{059}\\ {}.\mathbf{060}& .\mathbf{061}\end{array}& \begin{array}{cc}& .300\\ {}.\mathbf{062}& \end{array}\end{array}\right] \)  Estimates: \( \left[\begin{array}{cc}\begin{array}{cc}& .297\\ {}.\mathbf{059}& \end{array}& \begin{array}{cc}.096& .295\\ {}.295& .096\end{array}\\ {}\begin{array}{cc}.\mathbf{068}& .\mathbf{056}\\ {}.\mathbf{057}& .\mathbf{058}\end{array}& \begin{array}{cc}& .296\\ {}.\mathbf{059}& \end{array}\end{array}\right] \) 
Correlation matrix of error: Testlet1 = \( \left[\begin{array}{cc}\begin{array}{cc}& .3\\ {}& \end{array}& \begin{array}{cc}.1& .3\\ {}.3& .1\end{array}\\ {}\begin{array}{cc}& \\ {}& \end{array}& \begin{array}{cc}& .3\\ {}& \end{array}\end{array}\right] \) Testlet2 = \( \left[\begin{array}{cc}\begin{array}{cc}& .3\\ {}& \end{array}& \begin{array}{cc}.1& .3\\ {}.3& .1\end{array}\\ {}\begin{array}{cc}& \\ {}& \end{array}& \begin{array}{cc}& .3\\ {}& \end{array}\end{array}\right] \) Testlet3 = \( \left[\begin{array}{cc}\begin{array}{cc}& .3\\ {}& \end{array}& \begin{array}{cc}.1& .3\\ {}.3& .1\end{array}\\ {}\begin{array}{cc}& \\ {}& \end{array}& \begin{array}{cc}& .3\\ {}& \end{array}\end{array}\right] \) Testlet4 = \( \left[\begin{array}{cc}\begin{array}{cc}& .3\\ {}& \end{array}& \begin{array}{cc}.1& .3\\ {}.3& .1\end{array}\\ {}\begin{array}{cc}& \\ {}& \end{array}& \begin{array}{cc}& .3\\ {}& \end{array}\end{array}\right] \)  Estimates: \( \left[\begin{array}{cc}\begin{array}{cc}& .299\\ {}.\mathbf{059}& \end{array}& \begin{array}{cc}.102& .301\\ {}.301& .101\end{array}\\ {}\begin{array}{cc}.\mathbf{051}& .\mathbf{059}\\ {}.\mathbf{059}& .\mathbf{062}\end{array}& \begin{array}{cc}& .303\\ {}.\mathbf{058}& \end{array}\end{array}\right] \) \( \left[\begin{array}{cc}\begin{array}{cc}& .299\\ {}.\mathbf{058}& \end{array}& \begin{array}{cc}.098& .296\\ {}.301& .099\end{array}\\ {}\begin{array}{cc}.\mathbf{054}& .\mathbf{057}\\ {}.\mathbf{058}& .\mathbf{059}\end{array}& \begin{array}{cc}& .294\\ {}.\mathbf{058}& \end{array}\end{array}\right] \) \( \left[\begin{array}{cc}\begin{array}{cc}& .304\\ {}.\mathbf{059}& \end{array}& \begin{array}{cc}.104& .300\\ {}.303& .102\end{array}\\ {}\begin{array}{cc}.\mathbf{055}& .\mathbf{057}\\ {}.\mathbf{058}& .\mathbf{053}\end{array}& \begin{array}{cc}& .301\\ {}.\mathbf{060}& \end{array}\end{array}\right] \) \( \left[\begin{array}{cc}\begin{array}{cc}& .301\\ {}.\mathbf{058}& \end{array}& \begin{array}{cc}.099& .297\\ {}.303& .099\end{array}\\ {}\begin{array}{cc}.\mathbf{062}& .\mathbf{061}\\ {}.\mathbf{054}& .\mathbf{053}\end{array}& \begin{array}{cc}& .301\\ {}.\mathbf{059}& \end{array}\end{array}\right] \) 
Summary
On the basis of several indices, such as the Cronbach’s alpha, parameter recovery, or composite reliability values, this study shows that the measurement errors caused by ordinal scales, such as Likert, clearly affect estimation and reduce the composite reliability. In contrast, VASRRPs do not have these problems and help obtain more satisfactory parameter recovery, composite reliability, and Cronbach’s alpha values, especially when compared to Likert scales, which can be as coarse as three points.
Simulation Study 2: VASRRPs versus ranking
Given that ranking scales are ipsative and thus create singular covariance matrices, most statistical techniques are not applicable to such scales. In Simulation 2, we used exploratory factor analysis (EFA) with the principal component method (Dunlap & Cornwell, 1994; Loo, 1999) to estimate parameters, and then we compare the model fit and parameter recovery between the VASRRP and the ranking.
Methods
Simulation data
This study randomly selected one of the 4L4I and 4L8I datasets of the VASRRP generated by the CTCU model in Simulation Study 1. The numeric values of each item on a VASRRP can be transformed into ranking data through their orders on the VASRRP continuum. Since the results of the two datasets were similar, to save space, this section only presents the analysis and results for dataset 4L4I.
Analysis
In Simulation 2 we used SPSS to apply EFA in order to compare differences in model fit and parameter recovery for the VASRRP and the ranking data. We compared the model fit of the scales based on the proportion of variance explained (PVE), Cronbach’s alpha, and factor structure. Estimates of parameter recovery for the factor loadings and the correlation of the latent variables were also evaluated.
Results
Factor structure and parameter recovery
Factor structures of different scales
True Value  Ranking Scale  VASRRP  

Component  Component  Component  
Factor 1  Factor 2  Factor 3  Factor 4  Factor 1  Factor 2  Factor 3  Factor 4  Factor 1  Factor 2  Factor 3  Factor 4  
V11  .65  .662  .749  
V21  1.15  .823  .783  
V31  .70  .748  .722  
V41  1.20  – .416  – .377  .828  
V12  .75  – .358  .839  .730  
V22  1.05  .842  .759  
V32  .80  .741  .783  
V41  1.10  – .372  – .342  – .356  .813  
V13  .85  – .777  .711  
V23  .95  .726  .832  
V33  .90  .761  .775  
V43  1.00  – .559  .381  .767  
V14  .95  .532  – .447  .775  
V24  .85  .321  .651  .697  
V34  1.00  .734  .771  
V44  .90  – .687  .784 
Comparing the factor loading values of the two scales listed in Table 5, we can find that the factor loadings of the VASRRP were generally more desirable than those obtained for the ranking scale. This is due to the factor loading estimates being closer to the actual values. When the ranking scale was adopted, some of the factor loading estimates showed negative values and were far from the actual values of 0.6 to 1.2.
Parameter recovery for the different scales in terms of correlation of the latent variables
Latent Trait  True Value  VASRRP  Ranking Scale  

2  3  4  2  3  4  2  3  4  
1  .300  .100  .300  .220  .224  .125  .204  .206  – .128 
2  .300  .100  .320  .274  .346  .113  
3  .300  .275  – .230 
Cronbach’s alpha
Reliability and proportions of variance explained (PVEs) for the different scales
Cronbach’s Alpha  PVE  

Scale  Factor 1  Factor 2  Factor 3  Factor 4  
Ranking  .636  .696  .689  .733  58.85% 
VASRRP  .731  .773  .764  .811  59.66% 
Summary
Our findings indicate that the ipsative data produced by ranking has resulted in limitations on statistical analysis, such as unsatisfactory parameter recovery for factor loadings and correlation of latent variables, or incorrect estimation of the correlation of latent variables. Our results indicate that the use of a VASRRP can avoid these unwanted effects.
Empirical Study 1: Comparing the VASRRP and Likert scales for career interest assessment
In this study, the model fit, reliability, PVE, composite reliability, leniency biases, and covariance matrices from the participants’ actual responses were compared through empirically collected data from the VASRRP and Likert scales.
Methods
Assessment tool and data collection
Another data set was also obtained by using a Likerttype scale, for comparison. The Likerttype scale asked the same respondents to rate their preference or aversion for each of 54 items displayed on a computer screen by responding on the following 5point scale: very unfavorable, unfavorable, neutral, favorable, and very favorable. A counterbalanced design was used in which about half of the respondents performed their ratings using the VASRRP before proceeding to the 5point Likerttype scale, whereas the other respondents used the Likerttype scale first. It was not necessary to collect ranking data since they could be obtained simply by transforming the VASRRP data. This study collected 1,749 valid samples of 9th grade students in junior high schools (average age 15.2), among them 933 were males and 816 were females. All the students’ parents approved of their children’s participation in the research before data collection commenced.
Analysis
We first analyzed the model fit. The CTCU and CT models were used for the VASRRP and Likerttype data, respectively. Furthermore, the reliability, PVE, and composite reliability were also analyzed. We also compared the three scales in terms of their leniency biases, and differences in covariance matrices. The leniency bias refers to whether bias or errors existed in the respondents’ ratings and rankings, and this was calculated by comparing the mean and median values for the six interest types—a greater difference indicates a larger leniency bias (Chiu & Alliger, 1990) and that the respondents are more likely to provide overstated or understated ratings. A comparison of covariance matrices helps in examining whether the additional comparison procedure in the VASRRP affects the covariance matrix in a way similar to what happens when a rating scale is applied. Finally, the amounts of time participants required in order to complete the scales were also compared.
Results
Model fit, reliability, and PVE
Model fit indices of different scales
LikertType (CT Model)  VASRRP (CTCU Model)  

RMSEA  .079  .080 
CFI  .923  .936 
TLI  .919  .925 
SRMR  .095  .099 
Reliabilities and proportions of variance explained (PVEs) for the different scales
Latent Trait  Cronbach’s Alpha  Composite Reliability  

VASRRP Scale  LikertType Scale  Ranking Scale  VASRRP Scale  LikertType Scale  
R  .918  .912  .879  .997  .910 
I  .900  .891  .807  .997  .926 
A  .856  .836  .795  .997  .924 
S  .847  .836  .737  .998  .929 
E  .854  .830  .657  .997  .898 
C  .834  .812  .673  .998  .917 
PVE  55.75%  52.96%  44.18% 
Leniency bias
Leniency bias values for the different scales
R  I  A  S  E  C  

VASRRP scale  .003  .005  .004  .004  .003  – .003 
Likerttype scale  .012  .004  – .010  – .012  .007  – .014 
Ranking scale  – .029  – .022  .000  .006  – .015  – .014 
Covariance matrix
Covariance matrices of different scales
Ranking Scale  VASRRP  

R  I  A  S  E  C  R  I  A  S  E  C  
R  .09  .00  – .03  – .03  – .02  – .02  .05  .01  .00  .00  .01  .00 
I  .00  .07  – .01  – .02  – .02  – .02  .01  .05  .01  .00  .01  .00 
A  – .03  – .01  .09  .00  – .03  – .02  .00  .01  .05  .01  .00  .00 
S  – .03  – .02  .00  .06  – .01  .00  .00  .00  .01  .03  .00  .00 
E  – .02  – .02  – .03  – .01  .07  .00  .01  .01  .00  .00  .04  .02 
C  – .02  – .02  – .02  .00  .00  .06  .00  .00  .00  .00  .02  .03 
Time to completion
The participants took 919.65 s (SD = 229.95) on average to complete the VASPRP and 461.18 s (SD = 119.43) on average to complete the Likert scale. The paired t test revealed a significant difference [t(1748) = – 86.23, p < .01] between the amounts of time spent on the two scales.
Summary
The empirical data produced results similar to those of the two simulation studies. Using the VASRRP produced higher reliability and PVE. Moreover, with the comparison function of items in the same testlet, the VASRRP also reduced leniency bias, which maybe resulted from the longer time engaged with the scale. Despite the similar function of ranking and paired comparison, data collected from the VASRRP were not ipsative as produced by ranking and paired comparison, and could thus keep the appropriate property of covariance matrices, which enabled further statistical analyses such as factor analysis.
Empirical Study 2: Comparing the VAS and VASRRP for career interest assessment
This study compared the reliability, leniency biases, and time latency from the participants’ responses for the VAS and VASRRP.
Method
Assessment tool and data collection
The SCIA, which was introduced in Empirical Study 1, was used in this study. Another data set was also obtained using a VAS, for comparison. Instead of using a testlet for comparing and ranking items, the VAS version of SCIA individually and randomly presented the 54 items to each participant. In this study we collected two data sets from two groups of participants: The first data set included 246 valid samples of 9th grade in junior high schools (average age 14.9; 132 females and 114 males) for the SCIA VAS; the second included 251 9th graders (average 15.1; 118 females and 133 males) for the SCIA VASRRP. All of the students’ parents approved of their children’s participation in the research before data collection began.
Analysis
The analyses of reliability, leniency biases, and time latency were identical to the methods used in Empirical Study 1.
Results
Reliability coefficients of Cronbach’s alpha and leniency bias for the VAS and the VASRRP
Type  Cronbach’s α  Leniency  

VAS (N = 246)  VASRRP (N = 251)  VAS (N = 246)  VASRRP (N = 251)  
R  .944  .939  .0219  – .0026 
I  .955  .941  .0212  – .0034 
A  .940  .938  .0012  .0039 
S  .945  .936  .0160  .0030 
E  .956  .923  .0105  – .0001 
C  .960  .928  .0120  – .0020 
Constructing the VASRRPs

Step 1: Determine the number of items in each testlet The number of items in each testlet will determine the task for the participants and the data collected. As we have mentioned, the VASRRP can be used for rating, ranking, and paired comparisons. If there is only one item in each testlet, then the VASRRP is identical to a regular VAS, and the task that participants need to execute is simply rating the item on the line continuum. If there are two items in each testlet, then the participants need to execute the paired comparison task through dragging and dropping the items onto the line continuum. If there are three or more items in each testlet, then the participants need to execute the ranking task through dragging and dropping the items onto the line continuum. Researchers may determine the items in each testlet according to their theoretical constructs or their practical needs. For example, researchers may need the twoitem pairedcomparison format because they need to construct a scale for the bipolar personality traits (e.g., introvert vs. extravert); or they may need a sixitem ranking format for the hexagonal model of Holland’s (1997) interest types; or they may want to compare the same feature of four brands of cars. Generally we recommend that the items in each testlet cover all the dimensions/factors of a certain psychological construct. For example, if there are six dimensions of a workvalue theory, then six items representing the six dimensions are recommended to be included in the same testlet. The first item represents the first dimension/factor of the construct to be investigated, the second item represents the second dimension/factor of the construct, and so on. The positions of those items will be randomly presented. Researchers can use the dropdown menu to determine their items in each testlet.

Step 2: Determine the question in each testlet Each testlet should contain one question that asks participants to express their feelings, attitudes, or opinions, such as “How would you like the vacations below?,” “Which brand of car do you like the most?,” or “In your work environment, which one below would you value most?.” On the basis of the purposes and needs of the researchers, usually the score of the items representing the same dimensions in different testlets can be summed up for a subtotal score for the subscale of the dimension; or the scores of different dimension/subscale can summed up for the total score of the whole scale. Therefore, the same question may be applied to different testlets so long as the items differ. Questions can also be altered across different testlets to increase the diversity of expression (such as replacing the question “In your work environment, which one below would you value most?” with “Which company offer below attracts you most?”) However, researchers have to ensure that different questions across testlets elicit responses belonging to the same target variable.

Step 3: Determine the content of items in each testlet Each of the items in a testlet should be presented as verbal statements (e.g., “watch repairer” as a kind of vacation) or as graphics/pictures (e.g., the pictures showing the working environment of a watch repairer).

Step 4: Determine the anchors for the scale in each testlet On the right and left ends of the line continuum scale, there are two anchors for guiding participants’ expressions of their levels of feeling, attitudes, or opinions. The two anchors are usually bipolar verbs (e.g., agree, disagree) or adjectives (e.g., pleasant, unpleasant), which represent two increasingly opposite levels of attitudes, thoughts, or feelings. Usually the same anchors can be applied to different items and testlets.

Step 5: Determine the number of testlets in the whole scale Usually a scale will include several testlets, based on how many items would be enough to measure the psychological construct, opinions, or attitudes with acceptable reliability and validity.

Step 6: Preview the scale Using the “Preview and Record” button, researchers may test the scale they have constructed in advance to see whether it can fulfill their needs. They can revise the Excel template if they need to revise the scale. Researchers may also change the style of the scale, such as the length, width, and colors of the line continuum or the shape and colors of the icons, by using the “Chang Style” function. The testing data, which are the positions of each item on the line continuum, will be converted to values ranging from 0 to 1 as the score of each item, and then will be exported to an Excel output file for the researchers’ reference.

Step 7: Administer the scale After the researchers confirm the number and content of items in each testlet, as well as the number of testlets in the whole scale, they may submit the scale for administration. Researchers need to create a file name and instruction for the scale, which will be used for identification and explanation of the scale. They also need to create a password with which their participants will be allowed to access the scale. After these procedure, researchers can inform their study’s participants of the URL (i.e., www.vasrrp.net), the name of the scale, and the password for the scale. Then, their participants may log onto the website and press the “Take a VASRRP survey” button to respond to the assigned scale. The responses of each participant, which are the positions of each item on the line continuum (Fig. 7), will be converted to values ranging from 0 to 1 as the score of each item and will then be exported to an Excel output file.
After the administration of their survey, through the “Preview and Record” button on the website, using the created file name of the scale and the password for accessing the records, researchers may download the aggregated data of all the participants’ responses in the exported Excel file. In the file, each row includes a participant’s number, the date and time of taking the survey, and their scores of each item in each testlet, which are arranged in the order tetslet1_item1, testlet1_item2 . . . testlet2_item1, testlet2_item2, and so on.
General discussion
When encountering the issues surrounding the limitations of Likerttype scales, such as response styles and ordinal measurement data, researchers may adopt four approaches (Brown, 2014; Spooren, Mortelmans & Thijssen, 2012; Tabachnick & Fidell, 2001): The first one is ignoring the problems and treating all ordinal data as interval. The second approach is changing the format of the scales, such as choosing scales with comparison functions, such as ranking, to overcome the responsestyle biases caused by using Likert, but ignoring the problems of ipsative measures (e.g., Kolb, 2005; McCloy et al., 1999b). The third method is using a VAS to obtain finegrained measurements to avoid the measurement errors of Likerttype scales, but accepting that the data may still contain responsestyle biases and encounter problems with absolute judgments (Wewers & Lowe, 1990). The fourth approach is developing mathematical models coupled with paired comparison or ranking to overcome the limitations of ipsative data (e.g., Brady, 1989; Brown & MaydeuOlivares, 2011, 2013; Chan & Bentler, 1998; Jackson & Alwin, 1980), while enduring the added burden such methods entail. Despite their possible contributions, all of these proposed methods introduce new problems along with their solutions.
The VASRRP proposed in this study offers a fifth approach for overcoming the difficulties researchers encounter. In addition to the convenience of freeing researchers/practitioners from being concerned with the issues of the optimal number of points (categories) on the Likerttype scale (Alwin, 1992; Cox, 1980; McKelvie, 1978; Preston & Colman, 2000), the VASRRP’s finergrained measurements improved the psychometrical properties of Likerttype scales, and the Cronbach’s alpha, parameter recovery, and the composite reliability values were all substantially enhanced. These findings provide more converging evidence for previous claims (e.g., Babakus et al., 1987; Krieg, 1999) that coarsegrained and ordinal data, such as that produced by Likerttype scales, were more prone to measurement errors and reduced reliability. However, our expectation that a finegrained scale, such as a VASRRP, would have superior reliability was not completely fulfilled. First, in our simulation studies, the reliability of 4L8I was similar to the reliability of the VASRRP and was better than that of 4L4L, which indicates that a larger number of items in a scale may alleviate the problem of discrete response bias. Secondly, the simulation results revealed that the reliability of the VASRRP was not significantly higher than the 5point Likert scale; our empirical study also found that the VASRRP only significantly outperformed the Likert scale in two thirds of the subscales. These findings provide support for the previous findings that finegrained scales were not necessarily superior to coarsegrained scale in terms of reliability (Kuhlmann et al., 2017; McKelvie, 1978). More simulated and empirical studies with different types of designs are needed to clarify these mixed findings.
Another feature of the VASRRP is that instead of using a single item for judgment in each scale as in a traditional VAS, the VASRRP employed a multiitem (i.e., a testlet) format along with each scale. This innovation not only made the traditional VAS become a special case of the VASRRP, but also brought about several advantages. Firstly, the multiitem VASRRP enabled more possible types of scaling, such as ranking and paired comparison, when compared to the traditional VAS, which allows only for rating. The multiitem format also allowed respondents to make relative judgments instead of absolute judgments, which should reduce measurement error (Laming, 2004; Nunnally, 1967). Our empirical study showed that the multiitem testlet format of the VASRRP effectively reduced responsestyle bias when compared with a similar Likerttype scale by enabling relative judgments of career interests. This functionality is especially beneficial for the psychological tests focusing on revealing the withinindividual differences of dimensions of traits, such as styles, interests, or values. This advantage was illustrated by the fact that the multiitem VASRRP helped reduce leniency bias in our two empirical studies. As compared with either Likert scales or VASs, which were not able to curtail participants’ response styles, the VASRRP elicited less leniencybias, which may have resulted from participants spending more time judging their relative preferences for those items shown on the line continuum. However, it is noteworthy that the longer response latencies for the VASRRP than for the VAS may also represent a disadvantage, since previous studies using pairedcomparison formats have been criticized for being too timeconsuming (e.g., McCloy, et al., 1999a). Since the comparison of response latencies for the VAS and VASRRP resulted from our second empirical study, which was a betweensubjects design, more rigorous designs, such as a withinsubjects design along with thinkaloud protocols regarding participants’ mental processes of comparison, would help uncover more facts about the different mental operations at work while taking a VASRRP or VAS.
Second, integrating the multiitem testlet format with the finegrained measurements of VAS allowed quantitative comparisons of targeted traits in ranking and pairedcomparison tasks, for which only qualitative comparisons were allowed traditionally. Furthermore, the raw data for comparisons produced by the VASRRP could be more meaningful than Likert scale, ranking, or pairedcomparison scores when calculating regular statistics such as means, standard deviations, correlations, and covariance matrices, with no concern for the problems associated with a same summedtotal scale score across participants and singular covariance matrices produced by traditional ranking and paired comparison tasks. In our simulation and empirical studies, the raw data produced by traditional ipsative methods, such as rankings and paired comparisons, clearly demonstrated the limitations mentioned above. However, such disadvantages were alleviated by the VASRRP, as more satisfactory covariance matrices and parameter recovery for factor loadings, correlations of the latent variables, and estimations of the correlations of latent variables were found in VASRRP data.
Third, despite their ipsative nature, coupled with appropriate models such as CTCU, the VASRRP data were appropriate for model fitting and theory testing. This resolved the limitations of traditional ranking and paired comparisons, which could not produce data eligible for model fitting. Our simulation and empirical studies also demonstrated satisfactory parameter recovery using the VASRRP. When fitting VASRRP data with the CTCU model to explore or confirm theories, they can provide higher reliability than ranking data by modeling the relationships of the latent variables, measurement error, and the context effects in the same testlet, simultaneously. Although our findings supported the usefulness of the VASRRP data for overcoming the limitations of using ranking and pairedcomparison tasks in model fitting, the model fit indices of the VASRRP did not outperform those from Likerttype scales in the present studies. More research with different psychological traits and different VASRRP designs will be needed to explore the capability of VASRRP designs to enhance the construct validity of scales. Furthermore, as the VASRRP was presented in a testlet format, the draganddrop operation of items and the line continuum with a neutral point represents a special arrangement different from the traditional VAS. Whether this affects the generalizability of our present research results to other VAS formats will be worthy of more consideration in future research.
On the basis of their multiple functions, ease of use, and eligibility for various statistical analyses, VASRRPs can be easily applied to existing assessment tools and may subsequently overcome some of the limitations posed by Likerttype, visual analogue, or ranking scales. For example, the Minnesota Importance Questionnaire (Gay et al., 1971) and the Kolb Learning Style Inventory (Kolb, 2005) are both ipsative measures; however, VASRRP data can be obtained by slightly changing the methods used by respondents to provide answers/indications. Another example is the Gordon Personal Profile Inventory (GPPI; Gordon, 1993), in which the scoring is performed by partial ranking: Respondents have to select two items out of four (i.e., the most like me and the least like me), and a considerable amount of item information is lost. Such information loss would not occur if we used the VASRRP to produce the ranking data in the GPPI. Furthermore, a VASRRP can also work in place of a Likerttype scale by arranging items according to latent variables and using its graphic rating scale to calculate scores. For example, the original NEO Personality Inventory (Costa & McCrae, 1992) uses a Likerttype scale to measure five different types of personality traits and the Work Value Assembly (Sung, Chang, Cheng, & Tien, 2017) uses a Likerttype scale to measure seven dimensions of work values. We can replace the Likerttype scale with a VASRRP by forming testlets with five items corresponding to each of the five personality types and seven dimensions of work values.
As well as discovering diverse possible applications for VASRRPs, this study suggests several avenues of future research. The first is related to the functions of VASRRPs. VASRRPs incorporate forced choice into a testlet design to try to reduce or prevent response styles and socially desirable responses (or faking). But several issues remain to be clarified. Are the forcedchoice scores of the VASRRP more precise than those from VAS (i.e., a singleitem VASRRP) rating scales? Participants may have more difficulty comparing large numbers of items at once, thus reducing precision. The optimal number of items on a testlet, then, remains an important research question. Additionally, whether ranking or comparisons really reduce or prevent socially desirable responses from overoccurring also remains an open question, and further research should be conducted to test this. Finally, the original VAS format does not include a midpoint. The addition of a midpoint to the VASRRP may have distorted participants’ responses. How much, if any, distortion was created is an issue. Another issue is the nonoverlapping requirement for exerting forcedchoice function in the VASRRP format. Will rating behaviors be affected by the forced nonoverlapping of specific positions on the line continuum? If we investigate these problems, we could provide more evidence for when and how using the VASRRP is most advantageous. Another avenue will be to compare the functionality of the VAS and the VASRRP. Despite the finding that the VASRRP may elicit less leniency bias and deeper engagement than the VAS, the VASRRP did not show higher reliability than the VAS. More different types (e.g., different items in a testlet or different psychological constructs) of VASRRP need to be compared with VASs to reveal their actual differences. Future research could also compare differences in bias, validity, and reliability between scaled scores obtained by using IRT models to scale VASRRP scores and the original, nonscaled VASRRP scores. Finally, it would be worthwhile to investigate methods of strengthening VASRRP data analysis. For example, the CTCU model is not the only one that can be employed to process context effects; the correlatedtraits–uncorrelatedmethods model for processing MTMM data, or the correlatedtraits–correlatedmethods model (Widaman, 1985) could also be adopted for the analysis of VASRRP data. Further comparisons of the pros and cons of these different models will be required.
Notes
Acknowledgements
The collection of empirical data had been supported by the Ministry of Science and Technology (MOST 1042511S003 012 MY3) and the Higher Education Sprout Project of Ministry of Education
References
 Albaum, G. (1997). The Likert scale revisited: An alternate version. Journal of the Market Research Society, 39, 331–348.CrossRefGoogle Scholar
 Allen, I. E., & Seaman, C. A. (2007). Likert scales and data analyses. Quality Progress, 40, 64–65.Google Scholar
 Alwin, D. F. (1992). Information transmission in the survey interview: Number of response categories and the reliability of attitude measurement. In P.V. Marsden (Ed.), Sociological methodology (pp. 83–118). Cambridge, MA: Blackwell.Google Scholar
 Babakus, E., Ferguson, C. E., & Jöreskog, K. G. (1987). The sensitivity of confirmatory maximum likelihood factor analysis to violations of measurement scale and distributional assumptions. Journal of Marketing Research, 37, 72–141.Google Scholar
 Bagozzi, R. P., & Yi, Y. (1988). On the evaluation of structural equation models. Journal of the Academy of Marketing Science, 16, 74–94.CrossRefGoogle Scholar
 Baron, H. (1996). Strengths and limitations of ipsative measurement. Journal of Occupational and Organizational Psychology, 69, 49–56.CrossRefGoogle Scholar
 Bollen, K. A. (1989). Structural equation models. New York, NY: Wiley.Google Scholar
 Bollen, K. A., & Barb, K. H. (1981). Pearson’s r and coarsely categorized measures. American Sociological Review, 46, 232–239.CrossRefGoogle Scholar
 Brady, H. E. (1989). Factor and ideal point analysis for interpersonally incomparable data. Psychometrika, 54, 181–202.CrossRefGoogle Scholar
 Brown, A. (2014). Item response models for forcedchoice questionnaires: A common framework. Psychometrika, 81, 1–26.Google Scholar
 Brown, A., & MaydeuOlivares, A. (2011). Item response modeling of forcedchoice questionnaires. Educational and Psychological Measurement, 71, 460–502.CrossRefGoogle Scholar
 Brown, A., & MaydeuOlivares, A. (2012). Fitting a Thurstonian IRT model to forcedchoice data using Mplus. Behavior Research Methods, 44, 1135–1147. https://doi.org/10.3758/s134280120217x CrossRefPubMedGoogle Scholar
 Brown, A., & MaydeuOlivares, A. (2013). How IRT can solve problems of ipsative data in forcedchoice questionnaires. Psychological Methods, 18, 36–52.CrossRefPubMedGoogle Scholar
 Carmines, E. G., & McIver, J. P. (1981). Analyzing models with unobserved variables: Analysis of covariance structure. In G. W. Bohrnstedt & E. F. Borgatta (Eds.), Social measurement: Current issues (pp. 65–115). Beverly Hills, CA: Sage.Google Scholar
 Chan, W., & Bentler, P. M. (1998). Covariance structure analysis of ordinal ipsative data. Psychometrika, 63, 369–399.CrossRefGoogle Scholar
 Cheung, M. W. L., & Chan, W. (2002). Reducing uniform response bias with ipsative measurement in multiplegroup confirmatory factor analysis. Structural Equation Modeling, 9, 55–77.CrossRefGoogle Scholar
 Chimi, C. J., & Russell, D. L. (2009, November). The Likerttype scale: A proposal for improvement using quasicontinuous variables. Paper presented at the ISECON 2009, Washington, DC.Google Scholar
 Chiu, C. K., & Alliger, G. M. (1990). A proposed method to combine ranking and graphic rating in performance appraisal: The quantitative ranking scale. Educational and Psychological Measurement, 50, 493–503.CrossRefGoogle Scholar
 Clemans, W. V. (1966). An analytical and empirical examination of some properties of ipsative measures (Psychometric Monograph No. 14). Richmond, VA: Psychometric Society. Retrieved from www.psychometrika.org/journal/online/MN14.pdf
 Cook, C., Heath, F., Thompson, R., & Thompson, B. (2001). Score reliability in web or internetbased surveys: Unnumbered graphic rating scales versus Likerttype scales. Educational and Psychological Measurement, 61, 697–706.CrossRefGoogle Scholar
 Costa, P. T., & McCrae, R. R. (1992). Professional manual: Revised NEO personality inventory (NEOPIR) and NEO fivefactor inventory (NEOFFI). Odessa, FL: Psychological Assessment Resources.Google Scholar
 Couper, M. P., Tourangeau, R., Conrad, F. G., & Singer, E. (2006). Evaluating the effectiveness of visual analog scales: A Web experiment. Social Science Computer Review, 24, 227–245.CrossRefGoogle Scholar
 Cox, E. P. (1980). The optimal number of response alternatives for a scale: A review. Journal of Marketing Research, 17, 407–422.CrossRefGoogle Scholar
 Cunningham, W. H., Cunningham, I. C. M., & Green, R. T. (1977). The ipsative process to reduce response set bias. Public Opinion Quarterly, 41, 379–384.CrossRefGoogle Scholar
 Diedenhofen, B., & Musch, J. (2016). Cocron: A web interface and R package for the statistical comparison of Cronbach’s alpha coefficients. International Journal of Internet Science, 11, 51–60.Google Scholar
 Dunlap, W. P., & Cornwell, J. M. (1994). Factor analysis of ipsative measures. Multivariate Behavioral Research, 29, 115–126.CrossRefPubMedGoogle Scholar
 Ferrando, P. J. (2003). A kernel density analysis of continuous typicalresponse scales. Educational and Psychological Measurement, 63, 809–824.CrossRefGoogle Scholar
 Flynn, D., van Schaik, P., & van Wersch, A. (2004). A comparison of multiitem Likert and visual analogue scales for the assessment of transactionally defined coping function. European Journal of Psychological Assessment, 20, 49–58.CrossRefGoogle Scholar
 Funke, F., & Reips, U.D. (2012). Why semantic differentials in Webbased research should be made from visual analogue scales and not from 5point scales. Field Methods, 24, 310–327.CrossRefGoogle Scholar
 Gay, E. G., Weiss, D. J., Hendel, D. D., Dawis, R. V., & Lofquist, L. H. (1971). Manual for the Minnesota importance questionnaire (No. 54). Work Adjustment Project, University of Minnesota.Google Scholar
 Goffin, R. D., & Olson, J. M. (2011). Is it all relative? Comparative judgments and the possible improvement of selfratings and ratings of others. Perspectives on Psychological Science, 6, 48–60.CrossRefPubMedGoogle Scholar
 Gordon, L. V. (1993). Gordon personal profile inventory (GPP1): Manual. San Antonio, TX: Psychological Corporation.Google Scholar
 GranbergRademacker, J. S. (2010). An algorithm for converting ordinal scale measurement data to interval/ratio scale. Educational and Psychological Measurement, 70, 74–90.CrossRefGoogle Scholar
 Greenleaf, E. A. (1992). Measuring extreme response style. Public Opinion Quarterly, 56, 328–351.CrossRefGoogle Scholar
 Guyatt, G. H., Townsend, M., Berman, L. B., & Keller, J. L. (1987). A comparison of Likert and visual analogues scales for measuring change in function. Journal of Chronic Disability, 40, 1129–1133.CrossRefGoogle Scholar
 Harwell, M. R., & Gatti, G. G. (2001). Rescaling ordinal data to interval data in educational research. Review of Educational Research, 71, 105–131.CrossRefGoogle Scholar
 Hicks, L. E. (1970). Some properties of ipsative, normative, and forcedchoice normative measures. Psychological Bulletin, 74, 167–184.CrossRefGoogle Scholar
 Holland, J. L. (1997). Making vocational choices: A theory of vocational personalities and work environments. Odessa, FL: Psychological Assessment Resources.Google Scholar
 Holyk, G. G. (2008). Context effect. In P. J. Lavrakas (Ed.), Encyclopedia of survey research methods (p. 142). Thousand Oaks CA: Sage. https://doi.org/10.4135/9781412963947.n98 Google Scholar
 Hooper, D., Coughlan, J., & Mullen, M. R. (2008). Structural equation modelling: Guidelines for determining model fit. Electronic Journal of Business Research Methods, 6, 53–60.Google Scholar
 Jackson, D. J., & Alwin, D. F. (1980). The factor analysis of ipsative measures. Sociological Methods and Research, 9, 218–238.CrossRefGoogle Scholar
 Jaeschke, R., Singer, J., & Guyatt, G. H. (1990). A comparison of sevenpoint and visual analogue scales. Controlled Clinical Trials, 11, 43–51.CrossRefPubMedGoogle Scholar
 Jamieson, S. (2004). Likert scales: How to (ab)use them. Medical Education, 38, 1212–1218.CrossRefGoogle Scholar
 Kolb, A. Y. (2005). The Kolb learning style inventory—Version 3.1:2005 technical specifications. Boston, MA: Hay Resources Direct.Google Scholar
 Krieg, E. F. (1999). Biases induced by coarse measurements scales. Educational and Psychological Measurement, 59, 749–766.CrossRefGoogle Scholar
 Kuhlmann, T., Dantlgraber, M., & Reips, U.D. (2017). Investigating measurement equivalence of visual analogue scales and Likerttype scales in Internetbased personality questionnaires. Behavior Research Methods, 49, 2173–2181. https://doi.org/10.3758/s134280160850x CrossRefPubMedGoogle Scholar
 Laming, D. (2004). Human judgment: The eye of the beholder. London, UK: Thomson.Google Scholar
 Likert, R. (1932). A technique for the measurement of attitudes. Archives of Psychology, 140, 5–55.Google Scholar
 Loo, R. (1999). Confirmatory factor analyses of Kolb’s learning style inventory (LSI1985). British Journal of Educational Psychology, 69, 213–219.CrossRefGoogle Scholar
 Marsh, H. W. (1989). Confirmatory factor analyses of multitrait–multimethod data: Many problems and a few solutions. Applied Psychological Measurement, 13, 335–361.CrossRefGoogle Scholar
 Marsh, H. W., & Bailey, M. (1991). Confirmatory factor analyses of multitrait–multimethod data: A comparison of alternative models. Applied Psychological Measurement, 15, 47–70.CrossRefGoogle Scholar
 Marsh, H. W., & Grayson, D. (1995). Latent variable models of multitrait–multimethod data. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications (pp. 177–198). Thousand Oaks, CA: Sage.Google Scholar
 McCloy, R., Waugh, G., Medsker, G., Wall, J., Rivkin, D., & Lewis, P. (1999a). Development of the O* NET computerized work importance profiler. Raleigh, NC: National Center for O* NET Development.Google Scholar
 McCloy, R., Waugh, G., Medsker, G., Wall, J., Rivkin, D., & Lewis, P. (1999b). Development of the O* NET paperand pencil work importance locator. Raleigh, NC: National Center for O* NET Development.Google Scholar
 McKelvie, S. J. (1978). Graphic rating scales: How many categories? British Journal of Psychology, 69, 185–202.CrossRefGoogle Scholar
 Meade, A. W. (2004). Psychometric problems and issues involved with creating and using ipsative measures for selection. Journal of Occupational and Organizational Psychology, 77, 531–551.CrossRefGoogle Scholar
 Munshi, J. (2014). A method for constructing Likert scales. Research report, Sonoma State University. Retrieved from www.munshi.sonoma.edu/likert.html
 Myles, P. S., Troedel, S., Boquest, M., & Reeves, M. (1999). The pain visual analog scale: Is it linear or nonlinear? Anesthesia and Analgesia, 89, 1517–1520.PubMedGoogle Scholar
 Nunnally, J. C. (1967). Psychometric theory. New York, NY: McGrawHill.Google Scholar
 Nyren, O., Adami, O., Bates, S., Bergstrom, R., Gustavsson, S., Loof, L., & Sjoden, P. O. (1987). Selfrating of pain in nonulcer dyspepsia. Journal of Clinical Gastroenterology, 9, 408–414.CrossRefPubMedGoogle Scholar
 Tomás, J. M., Oliver, A., & Hontangas, P. M. (2002). Linear confirmatory models for MTMM matrices: The case of several indicators per trait–method combinations. In S. P. Shohov (Ed.), Advances in psychology research (Vol. 10, pp. 99–122). Huntington, NY: Nova Science.Google Scholar
 Paulhus, D. L. (1981). Control of social desirability in personality inventories: Principalfactor deletion. Journal of Research in Personality, 15, 383–388.CrossRefGoogle Scholar
 Paulhus, D. L. (1991). Measures of personality and social psychological attitudes. In J. P. Robinson & R. P. Shaver (Eds.), Measures of social psychological attitudes series (Vol. 1, pp. 17–59). San Diego, CA: Academic.Google Scholar
 Preston, C. C., & Colman, A. M. (2000). Optimal number of response categories in rating scales: Reliability, validity, discriminating power, and respondent. Acta Psychologica, 104, 1–15.CrossRefPubMedGoogle Scholar
 Price, D. D., McGrath, P. A., Rafii, A., & Buckingham, B. (1983). The validation of visual analogue scales as ratio scale measures for chronic and experimental pain. Pain, 17, 45–56.CrossRefPubMedGoogle Scholar
 Randall, D. M., & Fernandes, M. F. (1991). The social desirability response bias in ethics research. Journal of Business Ethics, 10, 805–817.CrossRefGoogle Scholar
 Raykov, T. (1997). Estimation of composite reliability for congeneric measures. Applied Psychological Measurement, 21, 173–184.CrossRefGoogle Scholar
 Reips, U.D., & Funke, F. (2008). Intervallevel measurement with visual analogue scales in Internetbased research: VAS Generator. Behavior Research Methods, 40, 699–704. https://doi.org/10.3758/BRM.40.3.699 CrossRefPubMedGoogle Scholar
 Rounds, J. B., Miller, T. W., & Dawis, R. V. (1978). Comparability of multiple rank order and paired comparison methods. Applied Psychological Measurement, 2, 415–422.CrossRefGoogle Scholar
 Scherpenzeel, A. C., & Saris, W. E. (1997). The validity and reliability of survey questions. Sociological Methods & Research, 25, 341–383.CrossRefGoogle Scholar
 Sheppard, L. D., Goffin, R. D., Lewis, R. J., & Olson, J. (2011). The effect of target attractiveness and rating method on the accuracy of trait ratings. Journal of Personnel Psychology, 10, 24–33.CrossRefGoogle Scholar
 Spooren, P., Mortelmans, D., & Thijssen, P. (2012). “Content” versus “style”: Acquiescence in student evaluation of teaching? British Educational Research Journal, 38, 3–21.CrossRefGoogle Scholar
 Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 677–680. https://doi.org/10.1126/science.103.2684.677 CrossRefGoogle Scholar
 Stevens, S. S. (1951). Mathematics, measurement, and psychophysics. In S. S. Stevens (Ed.), Handbook of experimental psychology (pp. 1–49). New York, NY: Wiley.Google Scholar
 Sung, Y.T., Chang, Y.T. Y., Cheng, T.Y., & Tien, H.L. S. (2017). Development and validation of a work values scale for assessing high school students: A mixed methods approach. European Journal of Psychological Assessment. Advance online publication. https://doi.org/10.1027/10155759/a000408
 Sung, Y.T., Cheng, Y. W., & Hsueh, J. H. (2017). Identifying the careerinterest profiles of juniorhighschool students through latent profile analysis. Journal of Psychology, 151, 229–246.CrossRefPubMedGoogle Scholar
 Sung, Y.T., Cheng, Y. W., & Wu, J. S. (2016). Constructing a situationbased career interest assessment for juniorhighschool students and examining their interest structure. Journal of Career Assessment, 24, 347–365.CrossRefGoogle Scholar
 Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4th ed.). Boston, MA: Allyn & Bacon.Google Scholar
 Viswanathan, M., Bergen, M., Dutta, S., & Childers, T. (1996). Does a single response category in a scale completely capture a response? Psychology and Marketing, 13, 457–479.CrossRefGoogle Scholar
 Wewers, M. E., & Lowe, N. K. (1990). A critical review of visual analogue scales in the measurement of clinical phenomena. Research in Nursing and Health, 13, 227–236.CrossRefPubMedGoogle Scholar
 Widaman, K. F. (1985). Hierarchically nested covariance structure models for multitrait–multimethod data. Applied Psychological Measurement, 9, 1–26.CrossRefGoogle Scholar
 Wu, C. H. (2007). An empirical study on the transformation of Likerttype scale data to numerical scores. Applied Mathematical Sciences, 1, 2851–2862.Google Scholar
 Yusoff, R., & Janor, R. M. (2014). Generation of an interval metric scale to measure attitude. Sage Open, 4, 1–16.CrossRefGoogle Scholar
 Zimmerman, D. W., Zumbo, B. D., & Lalonde, C. (1993). Coefficient alpha as an estimate of test reliability under violation of two assumptions. Educational and Psychological Measurement, 53, 33–49.CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.