The impact of procedural and distributive justice on satisfaction and manufacturing performance: a replication of Lindquist (1995) with a focus on the importance of common metrics in experimental design

This paper replicates Lindquist’s (Lindquist, Journal of Management Accounting Research 7:122–147, 1995) seminal research introducing the concepts of justice to the accounting literature. We use organizational justice theory, as did he, to replicate his study and, in doing so, question some findings of partial replications and extensions done over the past 25 years. We do this because work built off his study has challenged some of his findings. These challenges, we believe, have resulted from most researchers using different research metrics than did Lindquist. Many of these extensions have also used a mental-based task, instead of a manual-based one, in their experiments. We believe this constrains the ability to draw inferences and conclusions from this subsequent research. We further believe this constraint extends to much of the experimental research in the social sciences. In our research we replicate exactly Lindquist’s (Lindquist, Journal of Management Accounting Research 7:122–147, 1995) operationalizations of voice and vote and measure dependent outcomes for four of the same conditions he investigated. In contrast to most of other follow-up studies, we find, as did Lindquist, that having a voice only leads to significantly enhanced satisfaction with high-stretch targets, as compared to having a vote only or no input. We also corroborate Lindquist’s (Lindquist, Journal of Management Accounting Research 7:122–147, 1995) result that having a voice only leads to significantly greater satisfaction with the experimental task, as compared to participants with a vote only or no input. Additionally, unlike Lindquist (Lindquist, Journal of Management Accounting Research 7:122–147, 1995), we find participants allowed only a voice significantly outperform participants with a vote only and no input. We thus support Lindquist’s findings of a fair process effect for voice and perceptions of pseudo-participation related to vote.


Introduction
task, simulating manufacturing, should recreate the routine nature of industrial settings. Shin and Grant (2019) support this contention, noting that a representative manufacturing task should induce fatigue and boredom. The problem is, except for a few studies (Chow et al. 2001;Lindquist 1995), all experimental research examining the impacts of justice/injustice on manufacturing workers has employed Chow's (1983) mental-based symbol decoding task. Mental-based tasks like this can generate a higher level of interest and intrinsic motivation than manual ones. And with them, effort may become its own reward (Amabile 1993;Keller and Bless 2008).
Our experimental findings mirror those of Lindquist (1995). We also find that allowing participants to help set their performance standards leads to positive outcomes at low process control levels (voice) but backfires at high levels (vote). Having a voice enhances budget and task satisfaction, as compared to having a vote. We also find, as did Lindquist (1995), that experimental participants perceive vote to be a countervailing form of pseudo-participation, which results in a decrease in satisfaction. Finally, while Lindquist (1995) could not find performance effects with the toy castle task, our paper quilt task shows a significant positive effect for performance, such that participants with a voice outperform those without one. Additionally, participants with only a voice outperform those with only a vote.
Our findings imply several things. First, we show that, if a study aims to replicate or extend previous work, adopting an appropriate methodology with common metrics is essential. We operationalize procedural justice-voice and vote-in the exact same manner as Lindquist (1995) and find the same relationships between it, at low (voice) and high (vote) levels of process control, and satisfaction, as did he. Second, we deliberately use a different manual task (paper quilt making) but keep the type of task in line with the recommendations of Birnberg and Nath (1968) and Shin and Grant (2019) that representative manufacturing tasks should be repetitive and mundane. That likely explains why our voice and vote lead to similar enhancements in satisfaction in our study, as compared to Lindquist (1995), and why extensions using symbol decoding have had varied findings. Third, in employing paper quilt making, we offer the management accounting literature a valid manual-based task to use in experimental research. Finally, our study, unlike Lindquist's (1995), finds performance enhancements for participants with voice. Individuals given a voice outperformed those with none, with vote only, or with no input, even when piece-rate standards assigned to them were unattainable. Because we matched all other parameters in our study to those of Lindquist (1995), this might suggest that making paper quilts is a superior task to building castles from plastic pieces. Quilt making is much simpler than castle building, and perhaps the impact of learning played a smaller role.
The remainder of the paper is organized as follows. Section 2 reviews the literature and develops the hypotheses. Section 3 describes the method used to conduct this replication and extension, and Sect. 4 presents results. Section 5 provides a discussion and conclusions and addresses limitations of the study as well as directions for future research.

3
The impact of procedural and distributive justice on… 2 Literature review and development of hypotheses

Replication of experimental studies in management (accounting) research
Replication studies in the natural sciences are plentiful and often repeat laboratory experiments in the exact same way as the original study (Hensel 2019). Replications are encouraged in medical and pharmacological fields to ensure findings' efficacy and safety. But what of the social sciences? For decades, a stigma has existed toward replications in the social sciences. Here, a consensus seems to be that publishable manuscripts must extend previous research and offer new perspectives. While there are benefits to this paradigm, it creates the danger of making assumptions based on studies with limited sample sizes or errors (Schmidt 2009). Fortunately, Block and Kuckertz (2018) found that replications of experimental studies are increasingly common in social sciences, such as psychology, economics and management. Brandt et al. (2014) recognize that, since 2012, prestigious psychology journals have been willing to publish both failed and successful replications. This is vital, as other psychologists have demonstrated that too many published findings are not robust (Ortmann 2017) and perform poorly when replicated (Open Science Collaboration 2015). In economics, scholars have only recently begun to address the lack of replications (Motyl et al. 2017). Encouragingly, replications have been published in reputable academic journals and show that experiments, particularly in behavioral economics, seem to fare relatively well when reproduced (Camerer et al. 2016). Finally, management science lacks replications of key research (Kepes et al. 2014;Makel and Plucker 2014), which is troubling because this line of research is grounded in human behavior (Morrison et al. 2010). While a few management journals (e.g., Management Review Quarterly) have recently begun to publish replications (Block and Kuckertz 2018), the number is still low (Hensel 2019). In management accounting, reanalysis of key works is rare, and only now are top journals calling for replication studies.
For our study, we choose to replicate Lindquist (1995), because the author's work encompasses all three social science paradigms discussed above: psychology, economics, and management science. It borrows constructs of procedural justice and referent cognitions studied in psychology (e.g. Lind et al. 1990;Skarlicki and Folger 1997;Tyler 1994;Tyler and Blader 2003) and implements incentives for a real effort task, as common in behavioral economics (e.g. Carpenter and Huet-Vaughn 2019). It further addresses performance in connection with budget setting, which is a prominent issue in both general management and management accounting (Daumoser et al. 2018;Derfuss 2016;Liessem et al. 2015). Further, Lindquist's (1995) seminal work, which pioneered justice research in accounting (Indriani 2015), is broadly cited. That may explain why a number of authors have already crafted experimental studies to try to extend it and investigate the impact of justice on varied dependent measures (Libby 1999;Libby 2001;Chow et al. 2001;Libby 2003;Byrne and Damon 2008;Francis-Gladney et al. 2008;Nahartyo 2013;Kelly, Webb and Vance 2015;Gomez-Ruiz and Rodriguez-Rivero 2018). Also, numerous other authors have developed field studies and surveys to further investigate the impact of justice issues in accounting (Lau and Lim 2002a;Lau and Lim 2002b;Lau and Tan 2005;Lau and Tan 2006;Chong and Strauss 2017;Zainuddin and Isa 2019;Sudarwan 2019;Habran and Mouritsen, 2020;Safkaur and Pangayow 2020). Since our replication of Lindquist (1995) has an experimental design, we focus on the nine experimental extensions listed above in the remainder of this literature review.

The operationalization of procedural justice in experimental participative budgeting studies
Two key findings of Lindquist's original study arouse interest because they are not entirely supported in follow-up studies or significantly contradict theory. To begin, Lindquist finds that voice, as a form of low process control without decision control, enhances satisfaction in the presence of stretch targets. This increase in satisfaction is known as the "fair process effect" and is independent of the perception of budget attainability. Hence being allowed to express opinions, thoughts, and feelings boosts satisfaction, even with stretch targets. In his original study, Lindquist (1995) provides a basis for the operationalization of procedural justice as two independent variables (voice and vote) by employing a continuum of participation based on the Vroom-Yetton model. 1 This continuum ranges from no input (i.e., having neither voice nor vote) to voice indicating low process control, vote, and finally to voice and vote (i.e., the highest level of control). In Lindquist's study, voice refers to individuals' opportunities to express their opinions, thoughts, feelings, etc., and to comment on standard setting, while vote refers to individuals' opportunities to communicate their preferred standards to superiors. Secondly, Lindquist (1995) finds that, when participants who receive unattainable standards are given an opportunity to vote (a higher form of process control), they form referent cognitions of what their standard might have been (i.e., an attainable standard). These referent cognitions lead to perceptions of pseudo-participation, which result in declines in both budget and task satisfaction, where there should be increases from the higher form of process control (vote). As to Lindquist's (1995) investigation on the impact of voice as a form of low process control on performance, results show that enhancing procedural justice by allowing employees to voice their opinions, thoughts, feelings, etc., does not increase performance. Note that, in Lindquist's (1995) study, both voice and vote include an explanation as to why standards must be higher than what is attainable by the participants in certain conditions. Table 1 presents a summary of variables, operationalizations, and key findings used in seven of the nine experimental research studies that have extended Lindquist (1995). Libby (1999) operationalized voice as the opportunity to vote for preferred standards. She also tested explanation (yes/no) as to why the initial standard had to stay at a high amount. Cleary, Libby was actually testing the interaction of vote 1 3 The impact of procedural and distributive justice on… The impact of procedural and distributive justice on… 1 3 The impact of procedural and distributive justice on… and explanation, not voice. Measuring voice requires that participants vocalize their responses and not just write down their thoughts and feelings (Thibault and Walker 1975). So, in this case, Libby's (1999) operationalization of voice is actually Lindquist's (1995) representation of vote. Chow et al. (2001) measured voice in the exact same manner as Lindquist (1995) but administered it in conjunction with vote to form consultative participation. They focused on national culture and found U.S. participants to have significantly less satisfaction than Chinese participants with high-stretch (unattainable) standards. Libby (2003) operationalized voice through a case where participants were told that their supervisor was sincerely interested in their opinions. She found exposure to voice in this fashion led to participants producing less slack in an experimental procedure. Byrne and Damon (2008) operationalized voice in the exact same fashion as Libby (1999) and thus also actually measured vote. Finally, Nahartyo (2013) operationalized control procedural justice as giving participants an opportunity to express their thoughts about an initial budget (voice). They additionally included voting for a preferred standard in this construct, much like Chow et al. (2001) did. He found controlled procedural justice to significantly boost perceptions of procedural justice.
As mentioned above, Libby (1999), in trying to measure voice, actually measured vote but did find significant performance improvements for participants allowed a voice (vote) and an explanation. Recall that Byrne and Damon (2008) also represented vote as voice and found no effect for voice. It seems the countervailing, pseudo-participation effect of vote with an unfavorable outcome (high standards) negated the positive impact of what they termed voice. Lastly, Gomez-Ruiz and Rodriguez-Rivero (2018) constructed consultative participation as involving either real high (vote for preferred standard and receive it), real low (vote for preferred standard and receive standard 20% higher), and pseudo-participation (vote for preferred standard and receive standard which is "much higher"). They found autonomous motivation to be higher when employees participate in a real versus a pseudoconsultative process.
The finding that vote for an attainable standard and subsequent receipt of an unattainable one results in perceptions of the vote representing pseudo-participation is corroborated by Francis-Gladney et al. (2008). They found, using a decoding experimental task, that, if participants receive a favorable (attainable) standard after voting for their preference, they believe budget participation was real. If, however, after a vote for a preferred (attainable) standard, participants receive an unattainable budget, pseudo-participation is blamed.
As discussed above, Lindquist did not find any performance effects when participants have a vote, that is, decision control, or both a voice and a vote, which represents the highest level of process control. Libby (1999), however, did find a positive performance effect for participants who received voice (vote) and an explanation for high standards versus participants with no voice (vote) or explanation. Chow et al. (2001) found Chinese participants adhered more strongly to stretch targets than did their U.S. counterparts. Byrne and Damon (2008) found higher performance for the group receiving an explanation for high performance standards. They additionally found offering a different explanation for high standards in each production round led to significantly higher performance than offering the same one each time. While not directly measuring voice or vote, Francis-Gladney et al. (2008) found that, when participants received an unfavorable (unattainable) performance standard, they formed strong pseudo-participation perceptions. Also, Kelly et al. (2015) did not find any general impact of a form of procedural justice known as expost goal adjustments on performance. In their study, the adjustments are defined as an ex-post decrease in target budget, considering negative uncontrollable factors that arise during production and that enable a worker a better chance to meet goal and make bonus. Ex-post goal adjustments are an effective form of procedural justice when budgets received are moderate, but there is no impact on performance when standards are difficult.
The inconsistent findings from the studies above reveal that the relationship between procedural justice and satisfaction as well as performance in participative budgeting requires more reflection and clarification. In our study, we offer two main explanations for the contradicting results. First, as discussed above, we reason that conflicting findings stem largely from variations in research design. More precisely, the independent variable-procedural justice-has been operationalized and measured in different ways by Lindquist (1995) and subsequent researchers. Second, we propose that a manual-based task, as was used in Lindquist's original study and the follow-up study by Chow et al. (2001) that Lindquist co-authored, lead to different results than a mental-based decoding task used in all other follow-up experimental studies. We elaborate on this below.

Nature of task in experimental studies on procedural justice in participative budgeting
Environmental variables play a role in experimental research to elicit participants' responses in a pre-determined way (Birnberg and Nath 1968). Over 50 years ago, Birnberg and Nath (1968) laid the groundwork for experimental research in management accounting by recognizing the dichotomy between manual (physical) tasks versus those involving mental effort. They note that, in an iconic simulation like a laboratory experiment, the experimental task reduces a complex real-world manufacturing setting to a simpler substitute. They further indicate this substitute must be a representative task that participants react to in the same manner as manufacturing workers would to their tasks. Thus, the replication of a production line should employ a task that induces fatigue and boredom (Shin and Grant 2019). Boredom is defined as an unpleasant emotional state characterized by disinterest and difficulty with concentration (Fisher 1993;Loukidou et al. 2009). Birnberg and Nath (1968) propose that the use of a mental task in a laboratory experiment can introduce an unintentionally high level of participant interest. When a task is intrinsically motivating, effort can be its own reward (Amabile 1993;Keller and Bless 2008). Chow's (1983) symbol decoding job is one such task. Since most experimental participants in the above-referenced research are students of accounting and other business studies, their perceptions of satisfaction and performance measures may be driven by their interest in the task at hand. They may develop an intrinsic desire to improve upon their last performance, separate from the monetary payoff at the conclusion of the experiment (Shin and Grant 2019).
In all experimental studies extending Lindquist (1995), tasks are repeated a certain number of production periods. In Lindquist's (1995) study and the study by Chow et al. (2001), participants repeatedly build toy castles from the same number and kind of Loc Blocs© following an exact template. In the mental decoding tasks, participants are steadily confronted with a new set of symbols each production run and must translate these symbols into letters or words. Charness et al. (2018) find repetitions in constant manual tasks lead to fatigue and boredom of participants over time. While manual tasks are sometimes initially fun, they gradually become less motivating over time, leading to possible satisfaction and performance declines with repetitions. Thus Birnberg and Nath (1968) recommend manual tasks for research in management accounting, because these tasks better represent real-world situations, where the intrinsic interest of the job is low.
Two further task characteristics that are important within experimental accounting research are difficulty and the participant's familiarity with the task (Birnberg and Nath 1968). An ever-changing mental decoding task is likely more difficult than repetitive castle building, and more difficult tasks imply higher cost of effort (Charness et al. 2018). Further, most participants have presumably first-hand experience with Loc Blocs© but are less familiar with decoding tasks. When the level of difficulty or participant's familiarity with the task differ between tasks, results can be biased (Birnberg and Nath 1968). For example, in the relatively easy and well-known castle task, every participant should be able to build some castles with Loc Blocs©. Consequently, the level of performance should be quite similar across participants. In the more difficult and less familiar decoding task, some participants could excel while others fail. Given this, the level of performance might vary across participants.
The perception of task difficulty and the effort participants make to solve the respective task certainly depend also on participants' individual characteristics. In general, we may assume that solving mental tasks, similar to IQ tasks, depends mainly on participants' ability to make sense of complex facts, that is, the educative ability (Raven et al. 1998), and to store and reproduce information, that is, the reproductive ability (Raven et al. 1998). In decoding tasks, participants low in or without these abilities will likely be unable to compensate through greater effort. For example, participants who are uncomfortable with numbers or bad at memorizing, analyzing, or thinking creatively may have difficulty translating as many symbols into letters as more talented participants, no matter how hard they work. This is supported by Eckartz et al. (2012), who found that, in such cases, incentives seem to have very small effects on performance and differences in performance predominantly relate to individual skills rather than effort. Tasks like adding numbers or repetitive calculations depend clearly on both ability and effort (Eckartz et al. 2012). This might be the reason for variations in performance effects with manual versus mental experimental tasks. Clearly further research is needed to help clarify inconsistencies among performance effects in extensions of Lindquist (1995). We believe comparable results can be achieved by only replicating Lindquist's research design as exactly as possible, including using a manual task, as he did.

Hypotheses
If our assumptions are correct and inconsistent findings in follow-up studies stem from variations in research design, a replication study that retains Lindquist's research design and employs a manual-based experimental task should produce the same results. We embrace calls to replicate key studies before branching into new extensions of the work (Otley and Pollanen 2000). These calls also suggest slight variations in measures or empirical methods might add interest to the replication. Thus we hold constant Lindquist's (1995) manipulations of procedural justice (i.e., voice and vote) and focus only on situations of distributive injustice or receipt of unattainable standards. We also introduce a new manual-based task (i.e., paper quilt making) to the literature to illustrate that subsequent findings can be robust when retested with consistent experimental metrics. We therefor offer three hypotheses regarding the impact of procedural justice on satisfaction and performance, as did Lindquist (1995).

Hypothesis 1 Individuals allowed a voice only will be more satisfied with their budgets than individuals with similar ability offered no input or a vote only when unattainable budgets are received.
Hypothesis 2 Individuals allowed a voice only will be more satisfied with the experimental task than individuals with similar ability offered no input or a vote only when unattainable budgets are received.
Hypothesis 3 Individuals allowed a voice only will perform better than individuals with similar ability offered no input or a vote only when unattainable budgets are received.

Participants
Our population sample consists of 76 undergraduates from a mid-sized state university in Southern Austria. 2 To recruit participants, an email was sent to all enrolled students at the university. Students were invited to register anonymously for time slots provided in Doodle and arrived at rooms prepared for the experimental study at their scheduled time. The sample was first reduced to 69 participants, because the time limit of one hour was exceeded in the first experimental run with six parallel sessions and because one participant in another session did 1 3 The impact of procedural and distributive justice on… not understand the incentive scheme properly. The sample was again reduced to n = 66 after a proper yoking of participants occurred to balance conditions on ability. 3 The mean age of participants in our experiment is 25.2 years, and the participants include 22 men and 47 women. Their full-time work experience is 7.4 years and part-time 2.7 years on average. Thirty-six percent of participants come from business and management, 28% from cultural studies (e.g., language, teaching and education), 26% from psychology, and 10% from technical studies. We tested for confounding effects due to age, work experience, and field of studies. No effects were found.

Experimental task
Recall Lindquist (1995) had participants build toy castles from Loc Blocs©. Castles were constructed of 26 plastic toy pieces of various sizes and shapes. A quality castle needed to match a model in size and number of pieces. Additionally, four key pieces needed to match in size and exact color of pieces. Our task is designed to emulate the manual aspects of the castle building. Participants are charged with constructing paper quilts out of pre-cut scraps by gluing the paper scraps down on a sheet of paper, following a model of a quality quilt given to them. Figure 1 presents the model of a quality quilt.
Participants are provided with a box filled with pre-cut pieces of paper in assorted colors, shapes, and sizes. They are also provided a glue stick and a stack of white paper on which is printed an outline grid (i.e., an uncolored, blank rectangle), showing where to place pieces. The outline grid measures 11 cm × 10 cm. As the model of a quality quilt in Fig. 1 shows, the pieces no. 3 and no. 6 are of the same size and shape. All other pieces have different sizes and shapes. Only six paper pieces are needed to complete one quality quilt, but they must all fit the grid perfectly in size and shape. Also, two of the pieces (i.e., no. 3 and no. 5) need to match not only in size and shape but color as well (i.e., yellow and blue, respectively). Furthermore, one piece (i.e., no. 6) must not be yellow. We deliberately use light pastel paper of six different colors so that colors were not readily identifiable. We include a question in our final survey to ensure none of our participants were colorblind.

Independent variables
We use a 2 × 2 full-factorial design with two independent, between participants variables to operationalize procedural justice: (1) voice and (2) vote. 4 Unlike Lindquist (1995), we do not manipulate standard attainability (fairness), because we are only interested in the outcomes related to unattainable (unfair) standards. When standards set are more difficult to achieve than the ones attainable by the subordinates (i.e., the participants), the budget allocation is perceived to be unfair, and this helps to ensure that subordinates concentrate on the fairness of the process in making their overall fairness judgments. (See also Byrne and Damon 2008;Libby 1999 based on the two-component model of justice by Cropanzano and Folger 1991.) Our manipulations of voice and vote exactly follow Lindquist (1995). Voice is manipulated as either giving participants the opportunity to express their opinions, thoughts, feelings, etc., (voice) or not (no voice) before receiving their standards. Following Lindquist (1995), voice also includes an explanation for receiving unattainable standards as well as a compromise in the amount of product that must be built in the experiment in periods two and three (which are the blue periods where standards are ratcheted up 30% and 70% higher than what is attainable). Details of the compromise are discussed below. When participants are given a voice, the experimenters, that is, both the foreman and manager, encourage the participants to comment on the standard setting and on the level of standards set in previous periods. They also ask for the range of standards that the participants think is attainable. By contrast, when participants do not have a voice, they are not granted any of these possibilities and communication between participants and experimenters is minimized.
Participants in the vote condition alone are never asked what range of standards they feel is attainable. They write that information down for themselves, but the information is kept private. In the vote manipulation, participants are either given the right to communicate their preferred standard (vote) or they are not asked for their preference (no vote). When participants are given a vote, they are told that a person's personal choice influences the management's decisions, even if management has the final say. Participants with a vote are reminded to make sure to choose standards that maximize their compensation within their ability. Participants receiving a vote are also given an explanation and compromise as discussed above. By contrast, when participants do not have a vote, they receive standards set solely by management, regardless of what they think is attainable (unless they have a voice). Consequently, there are four possible conditions. Figure 2 illustrates them.
Irrespective of vote, individuals given a voice have low process control (VOICE in Fig. 2), while individuals without a voice have less process control (NO VOICE in Fig. 2). Similarly, irrespective of voice, individuals who are offered a vote have high process control (VOTE in Fig. 2), while individuals without a vote have low process control (NO VOTE in Fig. 2). Consequently, individuals who are given both voice and vote have the highest level of process control, while individuals with neither a voice nor a vote have no input at all. Individuals in the no-input condition rely solely on the incentive contract to motivate their performance.
The independent variable voice is validated with an ANOVA measuring responses to two statements from the final questionnaire on a Five-point Likert scale, which are combined into one construct. A higher mean indicates a stronger perception of having a voice. The independent variable vote is again validated with an ANOVA measuring the response to one statement from the final questionnaire, which is also on a Five-point Likert scale with a higher mean indicating a stronger perception of having a vote. Further details on the measurement as well as results are presented below in the section on the manipulation checks. Table 2 provides an overview of all variables, their definitions, values, and measurements.

Dependent variables
Three dependent variables are measured in our study: (1) satisfaction with unattainable budgets received, (2) satisfaction with the experimental task, and (3) performance. Each of the two dependent variables on satisfaction is measured with responses to four statements from the final questionnaire on a Five-point Likert scale, which are combined into one construct. A higher mean indicates greater satisfaction. Performance is measured as the summation of actual production of quality paper quilts in production rounds 2 and 3, because these are the two periods where standards are ratcheted up 30% and 70% respectively, either from participants' preferred standards when they are given a vote (cond. 1 and 3) or from participants' vocalized attainable standards when they are given a voice but no vote (cond. 2) or from participants' supposedly only privately noted attainable standards when they cannot give any input (cond. 4). Details regarding standard attainability are presented in Sect. 3.5, and the experimental pacing is described in detail in Sect. 3.7.

Measurement of standard attainability across conditions
The written range of attainable standards requested at the beginning of each production period for all participants is private information. Use of this data is contingent upon condition. For condition 1 (voice and vote), participants first orally discuss the attainable range they wrote down and express feelings about it (voice). Then they are asked to vote for a preferred standard by which they will work under the truth-inducing incentive scheme. A performance standard is determined to be the chosen standard in production period 1, and then one set 30% higher than chosen in period 2 (compromising from management's initial desire to increase the chosen standard by 45%), and finally 70% higher than chosen in period 3 (compromising from management's initial desire to increase the chosen standard by 85%). Thus the written attainable range is not used. In condition 2 (voice/no vote), participants speak about their attainable range but are not given a chance to vote for a preference. Thus their standard is determined as the midpoint in period 1, 30% above the midpoint in period 2 (after the same compromise from 45% discussed above), and 70% above the midpoint in period 3 (after the same compromise from 85% also discussed above). Again the written attainable range isn't used.
In condition 3 (no voice/vote), standard setting follows the pattern of basing the budget off the chosen standard as in condition 1, with the compromises in periods 2 and 3, again ignoring the written attainable range. Finally in condition 4 (no voice/ no vote), the written attainable standard range is used to set standards. Even though the information was deemed to be private, experimenters in actuality could glance at the written attainable range, before participants set it aside, and establish a midpoint, which they expressed to management. Management then sets the standard at The impact of procedural and distributive justice on… 4 ≤ TASKSAT ≤ 20 with a higher value indicating higher satisfaction "On the whole, I was satisfied with this task."

5-Point
Likert (1) strongly disagree (5) strongly agree "I would recommend this task to someone else as one that is satisfying."

5-Point
Likert (1) strongly disagree (5) strongly agree Performance Actual production of 0 ≤ Performance ≤ 14 Summation of quality paper quilts produced in production rounds 2 and 3 Metric TOTAL-FAIR Perception of fairness across production rounds 1-3 3 ≤ Performance ≤ 15 with a higher value indicating a higher perception of fairness In every production round: "The way this company goes about setting my standard is" 5-Point Likert (1) extremely unfair (5) extremely fair 1 3 The impact of procedural and distributive justice on… the midpoint for period 1, 30% above the midpoint for period 2, and 70% above the midpoint for period 3 with no explanation.

Compensation
Participants are paid a piece-rate salary based on quality units produced. Pre-testing determined that € 0.40 was appropriate for our study. Payment is calculated according to the scheme in the following equation.
where Y is the individual pay per production run, S is the performance budget, and A is the actual performance of the individual. This truth-inducing incentive scheme motivates individuals to attain budgets or face financial penalty. It also motivates them to express as high a budget as they feel is attainable to maximize their financial rewards, as there is no financial reward for producing over standard. This scheme is effectively employed in previous accounting research (Chow et al. 2001;Lindquist 1995;Young 1985;Young et al. 1993). Experimental participants are walked through the following examples until they confirm they understand the scheme. Imagine a standard is 6, and actual production is 4. Financial compensation is then € 0.40 (4)-€ 0.40 |4-6|= € 0.80. That means compensation is reduced by € 0.80 penalty for not obtaining the standard. If, however, a standard of 6 matches actual production of 6, then financial compensation is € 0.40 (6)-€ 0.40 |6-6|= € 2.40. Finally, if standard is 6 and actual production is 8, participants do not receive any additional compensation but only € 0.40 (8)-€ 0.40 |8-6|= € 2.40. Thus there is no financial reward for producing beyond standard. Overall, participants receive higher compensation when they make more quality quilts but only if they do not deviate from the standard, that is, the performance budget.
Average pay to participants is € 10 for one hour of their time. Our experiment ran half the time of Lindquist's (1995), which was necessary due to time constraints. We shortened the production runs from 10 min to eight and generally kept a brisk pace to make up time. Also, the paper quilt task, with only six pieces of assemblage, did not take as long to complete as the toy castle task, with 26 pieces. Figure 3 presents detailed, side-by-side steps of our experiment, as compared to Lindquist (1995). In every phase of the process, except for administering the paper quilt task, instead of the castle building task, we replicate Lindquist's work as closely as possible. Participants signed up for a one-hour block of time and, upon arrival, were greeted by the head experimenter, playing the role of the manager, and were escorted to a seating area to await production room assignments. Production rooms were sealed off from one another regarding sight and sound. Participants were unaware of what their "co-workers" were experiencing. In all conditions, experimenters followed a written script (see Appendices B and C).

Experimental pacing
We used six experimenters who were familiar with all four experimental conditions, but each experimenter was always assigned the same two conditions of the four. The manager was played by the same person in all sessions. Experimenters underwent approximately one hour and a half of training before the experiment began. The manager led each participant to one of the production rooms and introduced him or her to the foreman (i.e., the experimenter). Next the foremen read instructions and the case scenario to the participants and had them complete an unpaid practice period of eight minutes followed by an additional eight-minute production period with a set payment of € 0.40 per paper quilt. The objective of the unpaid practice round was for participants to familiarize themselves with the task, Experiment Steps in Lindquist (1995)  Two days to complete running from 8:00 am to 5:00 pm (Tuesday and Wednesday) Yoking process D one to balance ability across conditions Truth-inducing incentive scheme Used to reduce the propensity to create slack Assignment to conditions P articipants are randomly assigned to the conditions. Lindquist (1995) versus our study (2020) 1 3

Fig. 3 Experiment steps in
The impact of procedural and distributive justice on… ask for assistance if necessary, and get detailed feedback on the quality of their quilts. The objective of the paid practice round was to familiarize participants with a piece-rate incentive of € 0.40 per paper quilt. Next participants completed three actual production runs of eight minutes each using the truth-inducing incentive scheme. To avoid potential end-of-game effects, participants were not told the total number of production periods. Participants were placed randomly into one of four n/a n/a n/a n/a yes yes yes yes Reception of participants P articipants, approximately five at a time, were greeted by the head experimenter playing the role of the manager and escorted to a seating area to await production room assignments. Preparation of production rooms P roduction rooms were completely sealed off from one another regarding both sight and sound. Thus, participants were unaware of what their co-workers were experiencing. There was one experimenter per room playing the role of the foreman. Each production room contained two chairs and a table on which sat a box filled with Loc Blocs © pieces or paper quilt pieces (plus glue stick and white paper), respectively, of assorted colors and sizes. In all cases, experimenters followed a written script. Experimenter training before experiment two hours o ne and a half hours Start of Experiment Participants received the scenario that described the task at hand. Practice periods • 10/8 minutes unpaid • 10/8 minutes paid with $/€ 0.40 per unit Production period ONE for Condition 1 as example • Range attainment setting 1 • Voice 1 • Vote 1 • Budget Setting 1 • 1 st official production run (10 or 8 minutes, respectively) • Count quality toy castles or quality paper quilts, respectively, and ready for the next run. Production period TWO for Condition 1 as example • Range attainment setting 2 • Voice 2 • Vote 2 • Budget Setting 2 • 2 nd official production run (10 or 8 minutes, respectively) • Count quality toy castles or quality paper quilts, respectively, and ready for the next run. Production period THREE for Condition 1 as example • Range attainment setting 3 • Voice 3 • Vote 3 • Budget Setting 3 • 3rd official production run (10 or 8 minutes, respectively) • Count quality toy castles or quality paper quilts, respectively.  conditions; (1) voice and vote, (2) voice only, (3) vote only, (4) no input. The experiment took two days to run from 8:00 am to 5:00 pm both days.
In all conditions, participants were asked to complete a series of documents. First, there was a questionnaire on demographics at the beginning of the experiment. Second, at the beginning of each of the three production rounds, participants were asked to state their attainable range of production of quality paper quilts in one production period of eight minutes, to obtain their individual ability measures and to have a baseline for the increase in standards in the no-input condition (no voice and no vote). This questionnaire was put aside by the foreman so that participants perceived the attainable range as private information at this point. However, in all cases, the foreman could glance at this attainable range and relay that information to the manager. The third questionnaire was administered immediately following the setting of performance standards in every production period and asked participants to respond to the following statement: "The way performance standards are set around here is fair." Its purpose was to measure participants' perceptions of fairness in the sense of procedural justice as they moved through the experiment. This document was also placed away from the foreman and represented private information to the participant. Finally, after the third and last production period, all participants filled out a final survey questionnaire, which provided data for dependent measures related to satisfaction as well as other information (e.g., for manipulation checks).

Demographics
Even though participants were randomly assigned to experimental conditions, an ANOVA of demographic data is conducted to ensure they were evenly distributed. Data includes age, gender, and full-time or part-time work experience. There are no significant main effects or two-way interactions. Chi square tests are conducted to ensure the even distribution. There are no significant differences between conditions regarding age, gender, and work experience.

Manipulation checks
Manipulation checks involve checks on the independent variables of voice and vote (procedural justice). ANOVAs are conducted to test the manipulations. We conduct attention manipulation checks to measure whether voice and vote were perceived as existing. The first manipulation check for voice aims to determine whether voice is viewed as participation (procedural justice). Responses to two different statements in the final questionnaire are summed to measure differences in opinion regarding the amount of voice allowed participants. The first statement reads: "I could vocalize my feelings." The second states: "I had no chance to tell the foreman how I felt" (reverse-scored). Scale reliability for this two-item scale is very satisfactory, with a Spearman-Brown coefficient of 0.81. An ANOVA of the full model, with the composite dependent variables is run. A strong main effect is found for voice (F 1,65 = 27.08, p < 0.00; voice 8.14 versus no voice 5.25). Investigation of two-way t-test contrasts indicates the manipulation of voice is robust across vote conditions. Participants with a voice only (8.28) significantly believe they have more voice than participants allowed a vote only (4.24); t 33 = 5.14, p < 0.00. These findings lend strong support that the voice manipulation worked as intended. Thus voice is viewed as a means of participation. Tables 3, 4, 5 provide means, ANOVA results, and t-test contrasts for the manipulation checks and tests of hypotheses.
Note that a main effect for vote is also significant in the two-way ANOVA (F 1,65 = 4.33, p < 0.042; vote 6.12 versus no vote 7.27). One might expect this to happen, as participants with a vote are asked for their preference of standard. However, this significance indicates participants with no input (no voice and no vote) perceive they have significantly greater voice than those with a vote only (6.28 versus 4.24). This phenomenon will be discussed in upcoming sections.
The second manipulation check for vote determines whether vote is viewed as participation (procedural justice). Responses to the statement, "My decision as to the level of my standard had a lot to do with the standard that had been set for me," are used as a dependent variable in an ANOVA of the full model. A significant main effect is not found for vote (F 1,65 = 0.65, p < 0.42; vote 2.97 versus no vote 3.21). This is not too surprising, since our hypotheses claim that voice will have the strongest positive impact on the dependent measures, as compared to vote or a combination of voice and vote. We do find, 5 however, that the two-way interaction is almost significant at F 1,65 = 2.67, p < 0.11, which calls for t-test contrasts. The vote manipulation should not result in a statistically significant difference, since vote is seen as a form of pseudo-participation. As we predict, voice is the stronger form of procedural justice in our study. This is confirmed by comparing participants with a voice and a vote (2.81) to those with a voice only (3.56); t 32 = − 1.72, p < 0.10. Individuals with a voice only perceive they have more voice than those with a voice and a vote. This supports vote being seen as a form of pseudo-participation. Note that these findings do not imply a vote is not a form of high process control, but rather, as in Lindquist (1995), it is just not seen as one when unattainable work standards are assigned.

Tests of hypotheses
To test all hypotheses, full-factorial models are analyzed separately with dependent variables of budget satisfaction, task satisfaction, and performance. Budget satisfaction reflects individuals' satisfaction with unattainable budgets (i.e., standards) received and is measured with responses to four statements from the final questionnaire. The four statements are the following. 1) On the whole, I was satisfied with the standards under which I worked.
2) The standards I worked under were convenient for me. 3) I would have preferred standards different from those I received (reverse-scored). 4) I liked the standards which I received. Responses to these four statements are measured on a Five-point Likert scale and are combined into one construct named BUDSAT with a Cronbach's alpha of 0.91. 6 The higher BUDSAT, the more satisfied individuals are with the budget, that is, the standards they receive. Task satisfaction reflects individuals' satisfaction with the experimental task and is measured by the summation of responses to four statements from the final questionnaire. All four statements are measured on Five-point Likert scales, too. They are the following. (1) On the whole, I was satisfied with this task. (2) I would recommend this task to someone else as one that is satisfying. (3) I found this task to be unpleasant (reverse-scored). (4) I feel fortunate to have been involved in this task. Responses from these four statements are combined into one construct called TASKSAT with a Cronbach's alpha of 0.80. The higher TASKSAT, the more satisfied individuals are with their experimental task.
Performance is measured as the summation of actual production of quality paper quilts in production rounds two and three, where standards are ratcheted up 30% and 70% respectively from the midpoint of participants' preferred ranges of standards (cond. 1 and 3), vocalized attainable ranges (cond. 2), or attainable ranges discreetly communicated to the manager (cond. 4), depending on the condition. Performance and standards set in each period are reported in Table 6.

Hypothesis 1
Hypothesis 1 predicts that individuals allowed a voice only will be more satisfied with standards received than individuals with similar ability offered no input or a vote only, when unattainable budgets are received. The main effect for voice is not significant; F 1,63 = 2.14, p < 0.15. However, we believe this finding is driven by the negative influence of vote (i.e., half of the participants with a voice also had a vote). Lindquist (1995) established that having a vote, though it is a higher level of process control, results in satisfaction declines with the budget. Therefore the real test of this hypothesis lies in the contrast analysis of participants. Here we find participants with a voice only (12.06) are more satisfied with stretch targets than those with no input (10.87), but the difference isn't significant; t 31 = 0.81, p < 0.21. More importantly, we also find participants with a voice only (12.06) are significantly more satisfied with stretch targets, as compared to those with a vote only (9.25); t 31 = 2.10, p < 0.02. Overall, we find support for Lindquist's (1995) original findings regarding levels of procedural justice and budget satisfaction. Hypothesis 1 is partially supported. Two questions then arise: why did the influence of vote drive the main effect for voice above to insignificance, and why does voice lead to significantly greater satisfaction with unattainable standards than vote? The reason lies in participants' perception that vote is an insincere form of procedural justice or a form of pseudo-participation. To address why this might be, we examine the individuals' perception of fairness through the entire experiment. To measure this effect, we create a new variable, TOTALFAIR, by summing the fairness perceptions in production rounds 1-3. After each round, participants are presented with this statement: "The way standards are set around here is fair." A Five-point Likert scale is used, with a higher level indicating higher perception of fairness. A full factorial ANOVA is then run with the independent variables of voice and vote. As expected, a main effect for voice is significant at F 1,65 = 5.56, p < 0.02; voice (9.76) versus no voice (8.01). The main effect for vote, however, is not significant at F 1,65 = 0.10, p < 0.76. Contrast analysis confirms this main effect for voice and finds participants with a voice only (10.33) have significantly higher perceptions of process fairness over those with no input (7.67) at t 31 = 2.42, p < 0.01. Participants with a voice only (10.33) also perceive 1 3 The impact of procedural and distributive justice on… significantly higher budget setting fairness than do those with a vote only (8.35) at t 33 = 2.12, p < 0.02. A perception of pseudo-participation seems to arise when a participation process is deemed unfair.

Hypothesis 2
Hypothesis 2 predicts that individuals allowed a voice only are more satisfied with the experimental task than individuals with similar ability offered no input or a vote only, when unattainable budgets are received. An ANOVA of the full model finds both main effects for voice and for vote as well as their interaction to be insignificant. Contrast analyses further find participants with a voice only (13.89) to have greater task satisfaction than those with no input (12.40), but again the difference is not significant; t = 33 1.11, p < 0.14. Again we also find participants with a voice only (13.89) are more satisfied with the experimental task than those with only a vote (11.65) at t 33 = 1.78, p < 0.04. It seems again that voice's insignificant main effect is being reduced by the half of that population that also had a vote. Perceptions of unfairness related to vote again result in a higher form of process control undercutting satisfaction with an experimental task. We find partial support for hypothesis 2.

Hypothesis 3
Hypothesis 3 predicts that individuals allowed a voice only will perform better than individuals with similar ability offered no input or a vote only when unattainable budgets are received. The ANOVA of the full factorial model shows a significant main effect for voice at F 1,65 = 4.97, p < 0.03. Participants with a voice (5.58) have significantly greater performance, compared to those with no voice (3.92). Unsurprisingly, contrast analysis indicates that participants with a voice only (5.72) significantly outperform those with no input (4.20) at t 31 = 1.53, p < 0.07. In line with other findings, participants with a voice only (5.72) also outperform those with only a vote (3.65) at t 33 = 2.08, p < 0.03. Hypothesis 3 is fully supported.

Discussion and conclusions
We designed this study to reconsider some of the findings regarding the impact of procedural justice on satisfaction and performance that have arisen since publication of Lindquist's (1995) seminal study. Lindquist (1995) manipulated two forms (low/high) of procedural justice as voice and vote. Voice allowed participants to express their feelings regarding the setting of a piece-rate budget. Vote allowed them to vote for a preferred standard. Lindquist found that a lower form of process control in setting standards (voice) led to enhanced satisfaction over a higher form of process control (vote) when one would have expected the opposite. Subsequent partial replications and extensions, however, have found relationships among voice, vote, and varied dependent measures to be in partial agreement and in opposition to Lindquist's findings. Some research even finds voice to be a form of pseudo-participation. We contend there are two primary problems with the formation of experimental metrics in this prior work. First, subsequent research employs differing operationalizations of procedural justice, as compared to Lindquist. Additionally, the preponderance of research trying to extend Lindquist (1995) has employed a mental-based experimental task (symbol decoding), as contrasted with Lindquist's manual-based task (castle building), even though previous research suggests manual, repetitive tasks are necessary to properly represent a manufacturing environment. It is questionable whether mental-based tasks approximate manufacturing workers' reactions to measures of procedural justice.
Our study replicates Lindquist (1995) by employing exact operationalizations of his measures of voice and vote. We even obtained his scripts and experimental documentation for the abovementioned four conditions and translated them to German for the experiment to be run in Europe. We also purposely administered a different task-paper quilt making-to test whether that could elicit findings similar to those of Lindquist. We find, as did Lindquist, that participants allowed a voice only (low-process control) in setting budgets experienced significantly greater budget and task satisfaction than those allowed no input. Further, we also find, as did Lindquist, that participants with a high process control (vote for standard only) are significantly less satisfied with the task and budgets received than those with a low form of process control (voice only), thus supporting Lindquist's contention that vote is perceived as a form of pseudo-participation. Finally, unlike Lindquist (1995), our study finds a significant positive relationship between voice and performance in the experimental task. Specifically, we find participants who received a voice perform better-that is, they made more quality paper quilts-than others who did not receive a voice. Further, participants with a voice only outperform those with no input and, most importantly, also outperform participants with a vote only.
We believe our findings add not only to the management accounting literature but broadly to experimental research in the social sciences. That we reproduce the findings of Lindquist (1995) regarding voice and vote speaks to the importance of following the methods of the original research as closely as possible in a replication. Only then can practitioners extend findings and make decisive conclusions such as the problem of too much participation in standard setting backfiring for a manufacturer when workers must accept unattainable stretch targets. Sometimes it is better to just give employees a chance to air their grievances and leave it at that. We also find that workers who talk out their feelings about an unpleasant outcome, that is, high stretch performance standards, are significantly more productive than those with no chance for input.
Methodologically, we believe our findings impact experimental research across all the social sciences. Experimental tasks, measured in controlled laboratory studies, aim to emulate real-world occupations. In management accounting, these tasks are often meant to represent a production line. An experimental manufacturing task should be repetitive and boring, forcing participants to focus on the task at hand and the piece-rate payoff. Most research extending Lindquist (1995), however, employs mental-based tasks and unsurprisingly finds diverse and sometimes opposing outcomes to Lindquist's. We develop and employ a new manual-based experimental task and match Lindquist's findings exactly. Interestingly, we also find a positive performance effect for voice and performance. Perhaps the learning effect in our study when building a six-piece paper quilt as compared to a 26-piece toy castle in Lindquist (1995) is much smaller. In that sense, participants producing paper quilts might better represent production workers who already know their jobs. If so, paper quilt making is likely a more appropriate manual-based task than castle making for experimental studies.
A potential limitation of our research is that the replication is conducted in Europe, opening the possibility of differences due to culture between our population and Lindquist's (1995) U.S. subject pool. The preponderance of national culture research, however, suggests that residents of Western and Central Europe share traits with those of the United States (e.g., Congden et al. 2009). It is possible also that experimenter bias could be driving our results. We did, however, make every effort to avoid predisposition in our findings by assigning experimenters to conduct two of the four conditions. We chose two conditions, as becoming an expert in all four would have required more than 1.5 h of training. We also didn't want experimenters to be confused as to which condition they were executing. In our study, we also assume that participants are motivated to earn a piece-rate incentive and that the truth-inducing incentive scheme motivates them to obtain a standard at a maximum attainable level to maximize their income. It is conceivable, however, that participants' self-noted attainable ranges of production ability may not be purely a measure of ability but also of motivation. In that sense, participants might be yoked (matched) on not only ability but motivation as well. Finally, as in all laboratory experiments, researchers sacrifice some external validity for internal validity in the controlled environment.
A logical and necessary extension of our research would directly test manualversus mental-based tasks. Specifically, in the organizational justice context of this paper, researchers could measure the impact of voice and vote on satisfaction and performance using paper quilt making versus symbol decoding. That study would also likely include measures of motivation (intrinsic versus extrinsic). A study measuring differences in outcomes between manual and mental tasks could be conducted in any of the social sciences and add value to methods research. Another extension of our study could involve using the same manual task to measure the impact of various experimental operationalizations of voice and vote on satisfaction and performance. A study such as this could advance understanding of the impacts on satisfaction and performance of different experimental metrics of procedural justice. It could also serve as a methods analysis regarding the efficacy of making inferences about theoretical constructs when they are operationalized in multiple ways. It would also be interesting to experimentally test learning differences, as they relate to the amount of time it takes to learn new manual versus mental tasks. This contrast could also include similar measurements among various types of tasks.
In summary, a field of study has emerged from Lindquist's (1995) introduction of concepts of justice to the accounting literature. Subsequent research has examined many additional moderators and mediators of the impact of justice on satisfaction and performance. Our findings suggest researchers should reflect on the soundness of their methods as they pursue future replications or extensions of seminal works to ensure they have captured the same underlying concepts. Only then can confident conclusions and inferences result.