1 Introduction

Corporate scandals such as Enron, Worldcom, or Comroad as well as the financial crisis in 2008/2009 have strongly focused attention on compliance and ethical behavior in the business context (Berings and Adriaenssens 2012; Elm and Radin 2012; Gunia et al. 2012). Companies implement compliance programsFootnote 1 as part of their management control systems to make sure that business decisions are in line with company regulations, ethical principles, and the law (Ferrell et al. 2017). Management control systems aim at behavioral changes of employees toward more compliant behavior using action controls, personnel controls, and cultural controls (Merchant and Van der Stede 2012). While action controls include rules and procedures, which provide concrete behavioral guidance, personnel controls enhance employees’ understanding of what is expected in terms of compliance. Cultural controls consist of norms and values shared by organizational members.

In companies, codes of conduct, compliance trainings and whistle-blowing systems are core elements of compliance programs (Weber and Wasieleski 2013) aiming at implementing such controls into daily practice. While codes define the company’s self-imposed compliance framework, compliance training helps employees to understand and apply this framework in their daily doings. While whistle-blowing is an instrument to detect code violations ex post, it also influences behavior by making detection of misconduct more likely.

Regarding the scope as well as the coherence of compliance programs and their relationship to observed unethical behavior, ethical intentions, and whistle-blowing among employees, some survey-based research exists. As an early example, Weaver et al. (1999) shows that a higher level of commitment to compliance among the top management will lead to more comprehensive compliance programs. The scope of a compliance program is also relevant for compliance. A study by Kaptein (2015) among 5065 employees from US companies finds considerable evidence that a more comprehensive compliance program (i.e., combining various elements such as codes, training, whistle-blowing) is related to less observed unethical behavior. Furthermore, Ruiz et al. (2015), using 525 employees employed in banking and insurance industry in Spain, show that all elements of a company’s compliance programs should be deeply entrenched within the organizational decision-making process, because a higher degree of coherence between norms and practice is strongly and positively related to ethical intentions in the face of ethical dilemmas.

Despite the generally positive results of implementing compliance programs in terms of achieved compliance, there are several research gaps. For instance, while the studies consider the existence of various elements of compliance programs (Is there a code? Is there compliance training?), they do not consider the many possibilities available for designing a specific element which occur in practice. When looking at codes, there exist many normative guidelines on how to design them (e.g., Gibbs 2003). They should include examples, be very clear, and have an appropriate ‘tone from the top’ (Barth 2003). In practice, codes differ in these aspects, Kaptein (2004), any of which might affect effectiveness. Moreover, there is a lack of experimental research on the effects of certain design elements, which would allow drawing causal conclusions (Weaver 1995). Disregarding a code’s design might also be one explanation for the mixed results on the code effectiveness found (Kaptein and Schwartz 2008). As a company has substantial control over the design of its code and the other elements of its compliance program, the design is a crucial factor to consider (Cleek and Leonard 1998).

Thus, attributing differences in compliance among companies to differences in the compliance program companies have is difficult, as individual elements of the compliance program differ in their design. But, variation in compliance among companies is also due to the demographic composition of the company’s staff, its prevailing ethical culture, and size (O’Fallon and Butterfield 2005). Further, it is attributable situational elements, e.g., the temptation involved (Schwartz 2001), a fact which might account for differences among business units and functions within a company but also among companies from different economic sectors. All this makes it difficult to compare compliance levels among companies in a way, which would allow attributing differences in compliance to differences in compliance programs. Therefore, it makes sense to investigate drivers of compliance within a company to eliminate confounders or at least control for them by holding them constant.

To gain a deeper understanding of how the compliance program affects compliance, we address three research questions: first, are codes per se an effective core element of compliance programs? Second, are there design elements of a code, which increase compliance? Third, do types of compliance training and whistle-blowing channels as further elements of compliance programs matter?

To obtain empirical evidence, we conducted a factorial survey experiment, manipulating design elements of codes (e.g., adding examples) and the inclusion of a further element of compliance programs (whistle-blowing). In addition, we asked whether participants attended company-based compliance training as another program element. Then, we tested how a code per se, its specific design, and whistle-blowing affect managers’ ethical intent and willingness to report misconduct. A random sample of 4659 managers was recruited from a multinational European corporation as a part of a larger research project on the effectiveness of compliance programs. The net sample consisted of 1005 respondents, a response rate of 21.57%.

Our empirical findings show that it is not the code per se but its design, which matters for compliance. Some design elements, like positive pictorial illustrations, or elements providing guidance, as opposed to imposing sanctions, are highly effective, while others not. Explicitly mentioning whistle-blowing leads neither to more ethical intent nor to higher willingness to report misconduct compared to codes without this element. Regarding compliance training, evidence indicates that for a manager’s ethical intentions and willingness to engage in whistle-blowing, repeated compliance training matters, in particular compliance training, which is specific rather than general.

We contribute to the business ethics literature by providing a better understanding of how to make use of a compliance program’s design to improve compliance. As many of the features we study can easily be implemented and at very low costs for the company, this study immediately provides design options for practitioners. In addition, we identify functional and personal features relevant for compliance, which can help companies to identify populations in need of enhanced compliance training.

The remainder of our paper is structured as follows. The next section briefly summarizes the relevant literature, before we present our theoretical background and derive our hypotheses. Based on these, we present our research design followed by the empirical findings. We conclude by critically discussing our findings, deriving the paper’s contribution to theory and practice, describing the study’s limitations and outlining possible avenues for future research.

2 Literature review: codes, compliance training, and whistle-blowing

The most frequent elements of compliance programs are codes of conduct, compliance training, and whistle-blowing. Consequentially, these aspects constitute the main subjects of our study.

A code of conduct is a typical starting point of compliance programs (Kaptein 2015). It sets forth regulations of the company (e.g., rules, principles, values) relating to compliance issues and, thus, provides decision guidance as well as behavioral expectations to all organizational members, so that they deal with compliance situations in the desired way (Kaptein and Schwartz 2008). According to Kaptein (2015), the code’s content is important beyond immediate behavioral guidance, as it “provides the content of an ethics program, the basis on which all other components are built” (p. 420). The question arises, however, whether codes per se are effective in influencing behavior.

An extensive body of empirical literature has investigated code effectiveness, more than on any other element of compliance programs (McLeod et al. 2016). Overall, the literature reports mixed results as to whether having a code can make a difference. On the one hand, some studies found support for a positive link between the existence of a code and desirable outcomes such as ethical attitudes (e.g., McKinney et al. 2010; Adams et al. 2001), ethical behavioral intentions (e.g., Ruiz et al. 2015; Boo and Koh 2001), a reduced level of unethical work behavior (e.g., Kaptein 2015), or less pressure to engage in unethical conduct (e.g., Peterson 2002). For example, according to the study by McKinney et al. (2010), business professionals whose company has a code in place are more likely to perceive ethically questionable business practices concerning a variety of stakeholders as less acceptable than those who work in a company that does not have a code. This leads to the conclusion that codes are effective means to influence business professionals’ ethical attitudes. On the other hand, there are also several studies showing no significant relationship between having a code and ethical attitudes, ethical awareness, or ethical behavior (e.g., Rottig et al. 2011; O’Leary and Stewart 2007; Marnburg 2000). For instance, Kohut and Corriher (1994) survey 86 MBA students and conclude that codes do not influence ethical judgments in ethical dilemmas. As an explanation, they suggest that there is a lack of communication as most companies merely distribute their codes using employee handbooks. Goodell (1994) even found that employees who were bound by a code perceive the ethics of their company and management as more negative and feel themselves subjected to intense pressure to compromise their code.

These contradictory results support the conclusion that investigating the effects of the mere existence of codes is not enough. They hint at more complex mechanisms by which codes do or do not work. First, a lot of variation in ethical decision-making is due to demographics or personality traits, see, e.g., O’Fallon and Butterfield (2005). Compliance is, of course, also attributable to the situation at hand, say, the stakes and the temptation involved (Schwartz 2001). While each of these factors matters for behavior, each of them may also affect the relevance of a code, making it easier or difficult to affect behavior using a code. Second, relating to properties of the company and its compliance program, there are several factors influencing the effectiveness of codes such as the top management’s support (Kaptein 2011), the elements of compliance programs which supplement the code (Kaptein 2015; Ruiz et al. 2015) and general organizational characteristics like the perceived ethical climate (Kish-Gephart et al. 2010).

What we are particularly interested in is that code effectiveness might vary because the codes themselves differ. Kaptein (2004) found that codes can vary considerably in topics and values addressed as well as the tone in which they are written, and another factor deemed important for compliance is the formal design of a code (Kaptein 2015; Cassell et al. 1997; Molander 1987). As it is, there are many normative recommendations on how to design a code, e.g., by including clarifying examples (Barth 2003), a statement from the top management to demonstrate commitment to the code (Benson 1989), or adding internal whistle-blowing to detect code violations (Gibbs 2003). However, while there are convincing arguments for these recommendations, there is only little empirical evidence on the actual effectiveness of design elements. Schwartz (2004) conducted 57 in-depth interviews with code users to identify those design elements of a code, which are perceived by the interviewees to make a code more effective. These design elements include a code written in a negative tone, using many examples to clarify a code’s content, or justifying the code. As a company has substantial control over the design of its code as well as other elements of its compliance program, the design is a crucial factor to consider (Cleek and Leonard 1998) as it represents an accessible way to improve compliance.

Given the inconclusive evaluation of code effectiveness and their variation in practice, one may conclude that disregarding a code’s design might be one explanation for the mixed results of the effectiveness of codes: it is not just the existence of a code, but its design, which matters. However, when engaging in comparative research on the relevance of design features of codes, the multitude of factors in which codes differ among companies make it difficult to attribute differences in compliance to differences in codes. What would be needed to attribute effectiveness to features are codes, which are identical apart from one feature. Furthermore, companies surveyed may also differ in other regards, which are relevant for compliance, such as ethical culture, economic sector, or the other elements of the compliance program installed. All this indicates that survey-based research, comparing companies, is facing substantial difficulties in attributing differences in compliance to differences in codes. Experimental approaches seem more appropriate, but already Weaver (1995) drew attention to the fact there is a lack of experimental research as to whether design elements actually matter and to the best of our knowledge, this gap persists, McLeod et al. (2016).

While a code is a central element of a compliance program, it is not sufficient to achieve compliance. Already anecdotic evidence illustrates why: While Schwartz (2001) emphasizes that organizational members need to have a basic knowledge of code’s regulations as a key prerequisite for making decisions in accordance with them, Adams et al. (2001) found that many of their respondents were unable to cite the more specific content of their code, apart from the general commandment to be honest and to act ethically. As an instrument of ensuring conformity with the code, Weber (1993) emphasizes the importance of compliance training, because when organizational members do not know that a code exists (Warren et al. 2014) or are unfamiliar with its content (Wotruba et al. 2001), it is highly unlikely that the mere existence of a code will affect their behavior. In practice, compliance training shall improve this situation. According to Kaptein (2015), compliance training is the second most frequently used component (76.34%) of compliance programs in his survey. Another survey by the Ethics Resource Center (2014) shows that 81% of the surveyed participants from US-based companies indicate to have compliance training in place. However, Weber and Wasieleski (2013) revealed that often training sessions take place only once a year, last less than 1 h and have different objectives, ranging from increasing ethical awareness to imparting specific knowledge to deal with ethical dilemmas.

As for the content of compliance training, Ferrell et al. (2017) conclude that effective compliance training “must start with a theoretical foundation based on values, a code of conduct, procedures for airing ethical concerns, line and staff involvements, and clear executive priorities on ethics, all of which must be communicated to employees” (p. 229). Compliance training aims at justifying the code, emphasizing its importance for business decisions, demonstrate ‘correct’ behavior in concrete examples, increase awareness of what constitutes ethical dilemmas and how to cope with them as well as providing decision guidance of how to use the code in daily business (Kaptein 2008). As for the practice in companies, two surveys among members of the Ethics and Compliance Officers Association provided evidence that increasing awareness of ethical issues inherent to business situations, the existing ethics standards and how to apply the code in practice are the most frequently addressed purposes of compliance training programs (Weber 2015; Weber and Wasieleski 2013). Additionally, compliance training is also a signal to employees that “the organization not only offers specific skills to managers, but also indirectly communicates that ethical behavior is valued and that ethical dimensions should be considered in decision-making” (Treviño 1990, p. 207). Looking at how compliance training is conducted in practice, several modes occur but eLearning models and group discussions are the most frequently used types (Weber 2015; Weber and Wasieleski 2013).

The importance of training and communication for compliance has, by and large, been supported in the empirical literature (Kaptein 2015; Rottig et al. 2011). On its basis, it can be concluded that compliance training is indeed positively related to ethical decision-making (e.g., Ruiz et al. 2015; Warren et al. 2014). Nevertheless, results reveal that the effectiveness of training crucially depends on various factors such as the type of training (Harkrider et al. 2013; Schlaefli et al. 1985), the level of (in)formal training (Verma et al. 2016), the duration and frequency of the training (Schwartz 2004; Schlaefli et al. 1985), and the combination of training with other elements of the compliance program (Ruiz et al. 2015). The results also imply a need for tailor-made compliance training to address different needs (Benishek and Salas 2014).

To summarize, one can say that compliance training works, but a range of factors influence its effectiveness. These factors should be considered when designing compliance training programs, which vary in practice among companies. However, as to empirically based insights, on which to base the design of compliance training, a research gap exists. Thus, the question remains what we know about how compliance training should optimally be organized. Here, we are interested in the specificity and frequency of training.

Compliance may also be increased by a consequent enforcement of the compliance program (Ferrell et al. 2017; Barth 2003). In this regard, whistle-blowing is an important enforcement instrument (Molander 1987). According to Near and Miceli (1985, p. 4), whistle-blowing describes “the disclosure by organization members (former or current) of illegal, immoral or illegitimate practices under the control of their employers, to persons or organizations that may be able to effect action”. This definition comprises internal as well as external forms of whistle-blowing (Near et al. 2004). While internal whistle-blowing refers to reporting channels inside the organization (compliance officers), external whistle-blowing typically refers to reporting channels outside the organization (media, public prosecutors) (Miceli et al. 2008; Near et al. 2004). The theoretical case for the compliance-increasing effect of whistle-blowing is strong: it increases the chance of getting caught by installing a peer-surveillance. As Treviño et al. (1999) put it “[e]mployees are the organization’s first line of defense against ethical or legal problems because they are most likely to know about violations of the law or of ethical guidelines” (p. 134). However, observing misconduct is one thing, blowing the whistle is another thing. The effectiveness of whistle-blowing depends on the probability that employees are willing to report observed misconduct (Kaplan et al. 2009). According to a study by the Ethics Resource Center (2014) among US employees, more than four out of ten respondents observed misconduct (e.g., abusive behavior, conflicts of interest) in the past 12 months. Only 63% of these actually reported it, most of them (92%) through internal reporting channels (e.g., the direct supervisor). This is in line with a qualitative study by Schwartz (2004) among code users, which shows that a person’s decision to blow the whistle is subject to several contextual factors such as the whistle-blower’s proximity to the wrongdoer and its position in the company, whether there are bystanders or whether the whistle-blower fears retaliation. There is also empirical evidence that personal characteristics influence the willingness to report misconduct, e.g., sex, age or job level (Mesmer-Magnus and Viswesvaran 2005). Furthermore, the whistle-blowing channel might also be a sensitive issue. Gao et al. (2015) found that business students are more likely to blow the whistle when the whistle-blowing hotline is managed outside the company by an external call center provider, as opposed to the department of internal audit within the company. Interestingly, Kaplan et al. (2009) conducted a study among 91 MBA students, which came to the opposite results. Given this observation, and the fact that the stakes for whistle-blowers in real company are much higher than for students participating in an experiment, it is still an open question, under what conditions employees are willing to report observed misconduct and what companies can do to improve this.

In addition to the open questions regarding the substantive issues, there are also methodological challenges. The studies reviewed here most often use surveys (e.g., Kaptein 2015) or interviews (e.g., Schwartz 2004), and according to a review of the organizational ethics research by McLeod et al. (2016), there is only scarce experimental research on compliance programs and compliance, resulting in a lack of valid statements about causality. For instance, when investigating the effectiveness of a code’s design on compliance, the research design chosen should assure that the results can unambiguously be interpreted: is the behavior in a company with a (particularly designed) code more compliant, because it has an effective code or because having an effective code made it more attractive for more ethically minded employees? Further, comparing compliance among firms whose codes differ in many regards makes it difficult to attribute differences in compliance to specific differences in the codes. Therefore, it seems to be appropriate to answer the research question with an experiment.

3 Hypotheses development

A first hypothesis concerns the effect of a code. According to Treviño and Weaver (2003), a code serves as a signal that the company expects their members to behave in a compliant way. Thus, giving employees a code has potentially two main aspects: first, to increase their ethical awareness and second, to make them more likely to comply with the code per se (Rest et al. 1986). Therefore, we postulate a first and very basic hypothesis:

Hypothesis 1

Providing decision-makers with a code of conduct per se leads to more compliant decisions compared to providing no code.

Arguments and empirical evidence made in particular by Mazar et al. (2008) but also Adams et al. (2001) imply that a code primarily serves as an ‘ethical reminder’ and nothing more. If this is true, differences in the design of codes should not matter. Contrary to this view, we argue that a code’s design can make a difference and that differences in code design account for differences in code effectiveness in terms of compliance achieved. There are many recommendations on how to design codes. For instance, the regulations of codes should be accompanied by pictures (Cressey and Moore 1983) and examples (Barth 2003), the regulations themselves should be clear-cut, thus allowing limited discretion (Kaptein 1998) and codes should include a foreword by corporate representatives (Benson 1989). Such design elements aim at making the code’s content clear and catchy and, thus, increase its relevance. But there is less empirical evidence as to whether such design elements actually have an impact (Weaver 1995). What behavioral mechanisms may explain the impact of design elements which increase clarity and unambiguity of codes on ethical decision-making? Rational choice theory (Becker 1968), the theory of self-concept maintenance (Mazar et al. 2008), cognitive load theory (Sweller 1988), and dual-coding theory (Paivio 1979) provide valuable insights.

The economic approach to behavior, notably rational choice theory, models the decision to violate norms, such as those laid down in a code, as the result of a rational calculation, where benefits and costs are traded off (Becker 1968). This calculus is based on the expected benefits of violating the norm, the expected costs of being sanctioned, and the probability that the violation will be detected. In an organizational setting, a company can influence a person’s cost–benefit calculus, for example by adding a foreword to its code in which the company’s top management took a clear stance on the issue of compliance. In terms of costs and benefits of non-compliance, such a statement indicates that the top management is fully behind the rules and is, thus, likely to enforce the code, which makes it a more credible threat of sanctions.

Because not all decisions are merely rational calculations, the theory of self-concept maintenance (Mazar et al. 2008) expands this view by adding a further decision criterion: the maintenance of a person’s self-concept, notably as a good or bad person. According to this approach, people cope with situations, including deviant behavior that generally would negatively impact their self-concept, for instance, by re-categorizing the situation. This means that if the deviant behavior falls below a certain limit, people will deviate without compromising their self-concept in a negative way (Mazar et al. 2008). Transferred to the design of a code, a code of conduct, which defines its regulations in a clear-cut way, for instance by setting forth specific values of accepting and granting gifts, constitutes clear behavioral limits. Such a clear and straightforward code makes it more difficult for people to ‘bend’ the regulations of the code by leaving little to no room for interpretation. Thus, when violating a code’s norm, it is more likely that a person’s self-concept is negatively affected. As persons strive for maintaining a self-concept as a moral person (Mazar et al. 2008), it is more likely that they abide by the code.

The last two theories focus on how codes can be designed to be understood and learned easier, both of which, in turn, increase relevance. First, cognitive load theory (Sweller 1988) can be used to explain that using worked examples, which “consist of a problem formulation and the final solution” (Renkl 2014, p. 392), is an effective and efficient way to promote learning. By worked examples, there is a shift from ‘means-end analysis’ typically used by ‘novices’ to solve unfamiliar problems, which induces high extraneous cognitive load, to a deep understanding of the problem, which allows transferability, i.e., the creation of schemata (Sweller 1988), which alleviates to apply known rules to new situations. Applying the worked-examples approach to the design of codes, we posit that including examples in the code helps code users (e.g., employees) to better understand the underlying rules, principles, and the general intent of the code. While the qualitative study by Schwartz (2004) indicates that code users perceive examples as a factor, which clarifies a code’s content and, thus, improve comprehension, experimental evidence on this is lacking.

Finally, the dual coding theory developed by Paivio (1979) postulates that mental information is formed by non-verbal (e.g., pictorial) as well as verbal (e.g., textual) codes. The two kinds of codes are represented and processed by different cognitive subsystems, i.e., ‘logogens’ which process verbal codes and ‘imagens’ which process non-verbal codes. Even though these cognitive processes are distinct representational units, they can be interconnected. In addition, the theory predicts that when pictorial and verbal codes occur in combination, the chance that an information is retained and recalled when needed increases. The latter aspect is relevant for learning purposes, such as text comprehension (Clark and Paivio 1991), and, thus, can be applied to the design of codes. When supplementing written regulations of a code with appropriate pictures that capture the essence of each regulation, it is more likely that people can retain and recall the information when needed as the information is ‘dual coded’, textual and pictorial. Thus, by adding pictorial information, a code can be learned and retained easier, which increases its effectiveness. While a review by Butcher (2014) shows that combining visual and verbal stimuli leads to better outcomes (retention and transfer) compared to merely having textual stimuli, the dual coding argument was mostly applied by studies using undergraduate students (e.g., Cuevas et al. 2002) or pupils (e.g., Peeck 1974), but never to the context of codes of conduct. Here, evidence is lacking. Based on these arguments, we assume that codes designed in a ‘clear’ way (e.g., using examples, pictures, or a foreword from the top management) positively impact on ethical decision-making, leading to our second hypothesis:

Hypothesis 2

Design elements of codes, which increase clarity and unambiguity of the code’s prescriptions, increase compliant decision-making.

Hypotheses about the effects of whistle-blowing regulations on misconduct can recur to some of the arguments made above. From a rational choice perspective, whistle-blowing works by making actors who consider misconduct more aware of the risk of being caught, if they actually engage in misconduct. They do have to worry about being found out not only by the formal supervisory arrangements, but also by their co-workers and subordinates. This informal “supervision” is close and, moreover, constant. We will not discuss whistle-blowing addressing the press and public prosecutors, as this is fully independent of how the company organizes whistle-blowing, on which we focus. At the level of the firm, including whistle-blowing in the code increases the perceived likelihood that negative consequences of misconduct, whatever they are, will actually occur, making misconduct more expensive and thus, less likely. This general effect can be modified by the mode in which whistle-blowing is organized. The organization of whistle-blowing differs among companies and this too matters for the calculus of potential malefactors. The company can organize whistle-blowing internally, by setting up points of contact in the firm, for instance the compliance department. Alternatively, the company can organize whistle-blowing externally, by setting up points of contact outside of the firm, like an independent law office. From a rational choice perspective, one major obstacle to whistle-blowing is the potential costs for the whistle-blower. If the whistle-blower’s identity becomes known in the company/peer-group, there might be informal retaliations, up to mobbing, which impose serious consequences on the whistle-blower, up to the degree that s/he has to quit employment. Reducing these costs will increase the likelihood that someone who observes misconduct will actually blow the whistle. Regarding these costs, internally and externally organized whistle-blowing differ substantially, in particular, by the guarantees regarding the anonymity of the whistle-blower. In case of internally administered whistle-blowing, the expected costs for the whistle-blower are higher because the risk that co-workers find out about the whistle-blower’s actions is higher. Even if anonymity is formally granted, the identity of the whistle-blower is known to someone in the company, and informal networks and gossip may make whistle-blower’s identity known in his or her department. Higher potential costs for the whistle-blower imply that the chances that whistle-blowing actually occurs are lower, which in turn lowers the expected costs of engaging in misconduct: if a manager knows that everyone around him hesitates to blow the whistle on him because he may punish the whistle-blower, the deterrence effect of whistle-blowing is reduced or absent. In the case of externally administered whistle-blowing, the costs for the whistle-blower are lower. The whistle-blower’s identity is not known to anyone inside the company, but only to an external institution, typically a law firm, which is subject to strict regulations regarding passing on information. Retaliation is less likely and, thus, the chances that whistle-blowing will occur increase, which in turn increase its effectiveness as a control mechanism. Potential malefactors are aware that those around them can blow the whistle on them without being afraid of retaliation. We formulate the following hypotheses:

Hypothesis 3a

Including whistle-blowing in a code makes the code more effective in inducing compliant decision-making compared to codes without whistle-blowing.

Hypothesis 3b

Externally administered whistle-blowing makes the code more effective in inducing compliant decisions than internally administered whistle-blowing.

Complementary to the direct impact of whistle-blowing regulations in a code on misconduct is its impact on whether people report observed misconduct. Reporting misconduct has always been possible and has always occurred. Ultimately, a person needs an initial motivation to blow the whistle on someone. This motivation may arise from indignation about observed misconduct or other sources, but is a personal feature and as such exogenous to the organizational setting. Still, the organizational setting affects, how the exogenous motivation to blow the whistle translates into actual whistle-blowing. The idea underlying formalized whistle-blowing systems is to use mutual control as a reliable element of the company’s control and compliance system. However, as we elaborated above, the whistle-blower will consider the costs and risks involved by blowing the whistle on someone, in particular retaliation. Nevertheless, companies can affect how employees deal with observing misconduct. First, mentioning the possibility to blow the whistle on observed misconduct will bring this option to the attention of employees, signaling that blowing the whistle is legitimate and indeed expected. This per se should increases the likelihood to report misconduct. Second, the way whistle-blowing is organized affects potential costs for blowing the whistle and the company may, as we elaborated in the above paragraph, lower them by choosing a particular setting. Ceteris paribus, the chances that whistle-blowing occurs, are higher, when the whistle-blowing is organized externally rather than internally. Based on this argument, we formulate the following hypotheses:

Hypothesis 4a

Including whistle-blowing in a code increases the willingness to report misconduct compared to providing codes, which do not contain such a regulation.

Hypothesis 4b

Externally administered whistle-blowing makes the code more effective in increasing the willingness to report misconduct than internally administered whistle-blowing.

Compliance training is crucial to implement codes by demonstrating how the abstract rules of the code are applied in practice: for which situations is the code relevant, and, if it is relevant, what behavior does the code prescribe? From a sociological perspective, norms are internalized by actors during the socialization processes by which they became a member of a group, in our setting a company or department (Campbell 1964). People learn from observing others, and they may be socialized into groups where codes are valid guidance for behavior, but also into groups, where ignoring codes is the norm. We are interested in what companies can do to train their employees in terms of compliance and ethical decision-making, as merely issuing a code to employees is not enough. Persons need to be familiarized with its content and intent, and formalized compliance training is the typical way to achieve this.

Compliance training can be organized very differently: one dimension in which companies differ is the frequency of training. There are companies where employees receive only one initial training on joining the company; other companies have highly developed training schedules, where training is repeated, say once a year. A first argument, directly following from basic learning theory, is that repetition improves knowledge. The more often the training is repeated, the better employees know the code, which is a precondition, firstly for recognizing that a specific situation is subject to the code and secondly, for being able to apply the code to the situation.

In terms of the themes covered and the training methods employed, compliance training differs too. Some companies have a very general training, which remains abstract, basically repeating the content of the code. Other companies have elaborate schedules of themes, like handling travel expenses, issues of procurement, and dealing with competitors, where the regulations in the code are taught using examples. Weber denotes this as example-based learning (Weber 2007). In line with cognitive load theory (Sweller 1988), the use of worked examples alleviates learning and the construction of schemas stored in the long-term memory. The specific example is turned into a more abstract schema, which can be recalled by a person without high cognitive load whenever a situation at hand roughly matches prototypical or similar situations. By repeatedly applying the abstract rules in the code to more specific examples, the participants in a training establish their own abstract schemas. As these are the results of personal cognitive efforts, they are more effective and more relevant for the persons than learning the abstract principles in the code. Both lines of reasoning allow formulating two hypotheses:

Hypothesis 5a

The more compliance training a person had, the more compliant this person’s decision-making is.

Hypothesis 5b

Thematically specific compliance training is more effective at increasing compliant decision-making than general compliance training.

4 Research design and variable measurement

4.1 Participants and experimental setting

To test our hypotheses, we conducted a factorial survey experiment recruiting managers from a company as participants. Factorial surveys combine survey and experimental methods (Rossi and Anderson 1982). Participants were randomly assigned to one of the eleven groups, each of which was given a version of a code of conduct of a fictitious company called ‘TenCon’,Footnote 2 while the control group received no code at all. All code versions have the same content in terms of what rules are stated, but differ in that we manipulated a code’s design aiming at clarity and unambiguity, for example, by including pictures showing appropriate behavior. Another manipulation was the inclusion of internal or external whistle-blowing (“Appendix 1”). Compliance training was not manipulated, but differed among participants, both in terms of how often and in terms of what type of training they attended. In line with the hypotheses developed above, we tested how codes with certain design features, the administration of whistle-blowing, and participation in compliance training affect participants’ behavioral intentions regarding compliance. To capture compliance, participants were required to indicate their behavioral intention in several scenarios containing ethical dilemmas (vignettes, see “Appendix 2”). The ethical dilemmas covered issues of high practical relevance and were developed in cooperation with the company’s compliance officers, assuring a high degree of external validity. For each dilemma, the company’s current code of conduct stated what constitutes appropriate behavior, allowing to recognize compliance and non-compliance.

Contrary to student samples, which are often used in ethics research, managers have to deal with ethical dilemmas like the ones used here on a regular basis, encountered codes of conduct, and underwent compliance training. Thus, the study combines an appropriate sample with a highly realistic setting, which results in high external validity. At the level of the research design, a problem of comparative business ethics research on compliance is the difference in corporate ethical culture and the compliance programs which were discussed above. The variation in both elements precludes attributing differences in compliance found to differences in compliance programs, like the design of codes or the organization of compliance training. We chose to hold the various confounding variables such as the ‘living’ ethical culture or company characteristics constant, by covering only one company. Conducting an experiment, where different versions of an identical code are used, allows to attribute differences in compliance to the design differences in the codes, as factors like corporate culture or the nature of the company are constant for all participants.

We invited 4659 managers from a multinational European company to participate in our study, which was implemented online, using Unipark. Participants were contacted by the company, based on a random selection of managers from all regions (e.g., Germany, Asia Pacific), hierarchy levels, and corporate functions (e.g., R&D, Sales). In February 2015, participants received a first email from a member of the company’s top management announcing the study, followed by a second email with the link to the Unipark site, sent out between February and March 2015. A reminder was sent mid-March. The final sample consisted of 1005 respondents, yielding a response rate of 21.57%. On arriving at the study’s landing page, participants could choose freely between an English and a German version. Next, they were randomly assigned to either a code condition, receiving a code and immediately conducted a short training on their code,Footnote 3 or to the control group, receiving no code and no training. Subsequently, they had to deal with a series of ethical dilemmas. The study concluded with a post-experimental questionnaire including a manipulation check, demographics, personal values, occupational features, and the participation in company-based compliance trainings (Fig. 1).

Fig. 1
figure 1

Procedure of the research design

4.2 Experimental manipulations

We manipulated several design elements of a code of conduct and the inclusion of whistle-blowing channels as part of the code (see “Appendix 1” for a detailed list). The content of each code was exactly the same and based on the company’s current code.

To test Hypothesis 2, we aimed at increasing a code’s clarity and unambiguity, using several design options discussed in the abovementioned normative literature on how codes should be designed. First, with respect to triggering dual coding, we created two codes including pictures demonstrating either how to behave (‘positive pictures’) or how not to behave in a specific situation (‘negative pictures’). Second, we included a foreword to the code, in which the company’s top management took a stance on compliance. In one version, the statement was highlighting sanctions and the prohibitive character of the code (‘boundary systems’, see Simons 1995 for this notion). In another version, the foreword was positively formulated, emphasizing the character of guidance, highlighting company values (‘beliefs systems’, see again Simons 1995). The third pair of manipulations, oriented at the notion of example-based learning (Weber 2007), included examples illustrating compliant (‘positive examples’) or non-compliant behavior (‘negative examples’). The last pair of code versions focused on unambiguity and consists of putting the same basic content in a form, which makes clear that there is very limited discretion and little, if any, leeway. This code was presented either signed by the fictitious management board, or not signed at all (denoted as ‘limited discretion with signature’ and ‘limited discretion w/o signature’).

To test Hypotheses 3a, 3b and 4a, 4b, on the effects of whistle-blowing, we manipulated the occurrence of a whistle-blowing channel on top of the code’s basic content. To test Hypotheses 3b and 4b, we did so in two variants: whistle-blowing was either administered internally (i.e., whistle-blowers are asked to contact the company’s compliance officer), or externally (i.e., whistle-blowers are asked to contact an independent lawyer charged by the company). Hypotheses 5a, 5b on the effects of compliance training cannot be tested using ad hoc manipulations, as one cannot realistically manipulate attendance to compliance training. However, attendance to compliance training varies among participants, and we can treat this variation as a quasi-experimental manipulation. By surveying the number of attended training sessions and their specificity (e.g., general or specific training, e.g., antitrust trainings), we can test for effects of compliance training on compliance.

4.3 Dependent variables

The variables of interest are compliance and reporting of misconduct. Both are highly sensitive topics, which do not allow for direct questions. In line with previous research (e.g., Ruiz et al. 2015; Elango et al. 2010), we use the vignette approach to get valid information. The main characteristic of this approach is that ethics and compliance are never mentioned in the study. Instead, participants are given situations, where ethics and personal interests collide, and they are asked how they intend to decide. Given that the rules, as established in the company, and basic ethical considerations indicate what is ethical and in particular, what is compliant behavior, compliance and ethical behavior can easily be captured. Due to the setting of an online experiment, we have to measure managers’ intentions as a proxy for their actual behavior. While not identical, the strong link between intention and actual behavior is supported by meta-analyses conducted, e.g., by Kautonen et al. (2013) or Armitage and Conner (2001). The situations presented all entail ethical dilemmas. Ethical dilemmas occurring in managerial practice are manifold and it is not possible to present a comprehensive list. Therefore, we chose a set of six vignettes/situations, which were indicated by the compliance office of the company to be their most relevant compliance issues. These vignettes address a person’s ethical behavioral intentions but for four there is also company policy in place, which demands a certain behavior. Thus, for our analysis, compliance and ethical behavior can be treated synonymously. The remaining two vignettes relate to a person’s whistle-blowing intentions.

4.3.1 Intention to behave unethically

The four scenarios addressing ethical behavioral intentions cover an invitation to a luxury trip, travel expense accounting, dealing with insider information, and an invitation to a business dinner (for a detailed description, see “Appendix 2”). For each ethical dilemma, the participant was asked how likely it is that s/he would engage in the unethical behavior described. Answers were given on a six-point Likert scale ranging from 1 (very unlikely) to 6 (very likely); higher values indicate higher levels of unethical behavioral intentions and lower compliance. The answers were combined to a summary measure of the participants’ unethical behavioral intention.

4.3.2 Intention to report misconduct

This was measured using two scenarios. The first concerned the misappropriation of company property by a colleague; the second describes a case of a line manager demanding a kick-back payment. For each scenario, the participant was asked how likely it is that s/he would report the misconduct. Answers were given on a six-point Likert scale ranging from 1 (very unlikely) to 6 (very likely); higher values indicate stronger intentions to engage in whistle-blowing. Again, the answers were combined to a summary measure of the participants’ intention to engage in whistle-blowing.

We applied the criteria developed by MacKenzie et al. (2005) to differentiate formative from reflective constructs and evaluated both constructs as formative measures: while the overarching topic of all dilemmas is ethics and compliance in general terms, ‘ethicality’ is not a personal trait but also depends on the situation at hand (McDonald 2000). There is ample evidence that a person can comply with the code in one situation but not abide by it in another one (e.g., Valentine and Hollingworth 2012; Sweeney and Costello 2009). Consequently, a person’s behavioral intention for one situation is not a good predictor for the behavioral intention in other situations. Thus, different dilemmas are not interchangeable, but define distinctive aspects of the construct ethicality/compliance. As we distinguish between the intention to behave ethically and to engage in whistle-blowing, we conducted a factor analysis with all ethical dilemmas for the whole sample and a separate one for managers who did not receive a code (“Appendix 3”). Both factor analyses supported the two-factor structure with the first four ethical dilemmas relating to the intention to behave ethically and the last two dilemmas addressing the intention to engage in whistle-blowing.

4.4 Control variables

Previous research indicates that personal variables, like demographics and personal values, matter for ethical decision-making (e.g., O’Fallon and Butterfield 2005; Kish-Gephart et al. 2010). Thus, to build a more comprehensive research model and control for possible confounding effects, our study concluded by asking questions about personal features, such as demographics (i.e., gender, mother tongue), personal values (i.e., religiosity), and occupational features (i.e., region, functional unit, hierarchy level, work experience). These background variables allowed us to identify variations in compliance among the functional units of the company (e.g., sales, R&D), or at different hierarchy levels (e.g., middle vs. top management). As there is empirical evidence that intention measures can be subject to a social desirability bias (e.g., Randall and Fernandes 1991). Therefore, we control for impression management, a form of social desirability bias, by a scale developed by Paulhus (1991, 1984) and a German version developed by Winkler et al. (2006), Cronbach’s alpha = 0.411. See “Appendix 4” and “Appendix 5” for all variables used in this study and the correlation table.

4.5 Data analysis technique

To test our hypotheses, we performed OLS regression analysis. We assessed the appropriateness of OLS regression assumptions of linearity, homoscedasticity and normality according to Keith (2015). The bivariate relationships between interval-scaled independent and dependent variables are linear. All other independent variables are dummy coded. Normality of residuals is assessed using normal p–p plots taken from full models (Model 4 in Tables 2, 3, 4). There is some departure from the 45° line, but in acceptable limits. Inspecting scatterplots of residuals obtained from full models and the independent variables indicate that the variance of error terms increases for higher values of the dependent variables. Running Breusch-Pagan tests for homoscedasticity shows that there is significant heteroscedasticity in the sample. For this reason, we use the HC3 standard error estimator (Hayes and Cai 2007) in our regression models. Furthermore, we used bootstrapping (Efron and Tibshirani 1986) as a robustness check for our OLS regression models with 5000 drawings (Hair et al. 2014) and the bias-corrected and accelerated confidence interval (Efron 1987). The results are fairly consistent.Footnote 4

5 Results

We performed manipulation checks to see whether our manipulations worked as intended. First, participants who received a code had to do a short training based on their code to make sure that they are familiar with it. We conducted a Kruskal–Wallis Test to see whether participants are more familiar with certain code versions. Results show that participants are equally familiar with their code, no matter which code variant they received, H(9) = 12.629, p = 0.180. The average percentage of code familiarity is at a high level of 0.935 (93.5%) (SD 0.096). Second, we tested whether participants recognized the manipulation we added to their code by adding a code-specific item to the post-experimental questionnaire, e.g., ‘The TenCon Business Conduct Guidelines contain examples of misconduct’ which is relevant for codes including negative examples). Among all design manipulations and whistle-blowing channels, on average 87.80% evaluate the statements correctly by indicating a 4–6 on a 6-point Likert scale ranging from 1 (strongly disagree) to 6 (strongly agree). Thus, we conclude that our manipulations worked as intended.

Table 1 presents descriptive statistics of the dependent variables for the study’s manipulations including the no-code baseline condition. Having a code, as opposed to having none, reduces on average managers’ intentions to behave unethically and increases their willingness to report misconduct. However, a closer look reveals this to be not the case for all codes alike. Certain design elements as well as including whistle-blowing in the code make a difference for both behavioral factors. For example, the code using limitations to make the code’s regulations clearer is most effective, followed by the code including pictures showing appropriate behavior. The latter code has also the second highest mean value regarding managers’ intention to engage in whistle-blowing. While both whistle-blowing channels are of similar relevance for managers’ behavioral intention, a code including external whistle-blowing, as opposed to internal whistle-blowing, is more effective in inducing managers to report misconduct.

Table 1 Descriptive statistics by manipulation and baseline condition

We begin by testing Hypothesis 1 and the set of Hypotheses 5a, 5b. The first hypothesis postulated that presenting any code to decision-makers leads to a more ethical behavioral intention. In this context, Hypotheses 5a, 5b are closely related. Hypothesis 5a posited that decision-makers who attended various company-based compliance training make more ethical intentions. Furthermore, Hypothesis 5b assumed that specific compliance training (e.g., anti-corruption) is more effective for making ethical intent than general compliance training sessions.Footnote 5 To test these hypotheses, we regressed unethical behavioral intention on the presence of a code, company-based compliance trainings (number and specificity), and control variables (Table 2).

Table 2 The impact of a code and compliance training on the intention to behave unethically

Regarding our first hypothesis, we find empirical support that the existence of a code (i.e., without considering its design) is negatively related with the intention to behave unethically. In other words, presenting a code to the managers before making a decision significantly reduces their intention to behave in an unethical way. This effect is robust against including control variables (Models 1–4). Furthermore, looking at the coefficients in Model 2, we see that managers who attended company-based compliance training tend to state intentions more in line with the compliance policy of the company. In addition, looking at the types of training (Model 3 and 4), general as opposed to specific, we see that having attended a specific training is much more effective than the general training for improving managers’ ethical intentions. However, when including control variables, the positive impact of attendance of compliance training sessions (not tabulated) and the specificity of training on more ethical decision intentions disappeared. Therefore, Hypotheses 5a/5b are not supported. Note that these results of company-based compliance trainings on managers’ unethical behavioral intentions remain identical when focusing on a code’s design (Table 3).

Table 3 Impact of a code’s design elements, whistle-blowing, and compliance training on the intention to behave unethically

Regarding the control variables, we see that there are only two features affecting unethical intent in a significant way. First, compared to Germany, the reference category, only Asia–Pacific differs in terms of ethical behavioral intention. On average, managers in the Asia–Pacific region exhibit behavioral tendencies, which are more unethically and less in line with the company’s compliance policy. Second, we see that managers at higher hierarchical levels intend to behave more ethically. We also control for personal properties (e.g., age, gender, work experience) but, as none of those is significant, the results are not reported here.

Next, we test Hypothesis 2 which stated that it is not the existence of a code per se (e.g., as kind of an ethical reminder) but a code’s design, which makes the difference. If a code was just an ethical reminder, its content and design should not matter; all code variants should be equally effective. But we presumed that codes, which are made easier to learn, are formulated more clearly, or provide a limited discretion, lead to a more ethical intention. To test this hypothesis, we regressed unethical behavioral intention on a code’s design, frequency and specificity compliance training, and control variables (Table 3).

Overall, there is empirical evidence that a code’s design matters for making ethical intention. A code which is clearly written and provides only a limited discretion has a statistically positive impact on the managers’ ethical intent (denoted as dLimited discretion w/o signature). This is also true for a code using pictures illustrating appropriate behavior (denoted as dPositive pictures) as well as a foreword by the top management emphasizing the importance to adhere to the company’s code and values (denoted as dBeliefs systems). All these findings are robust against modifications of the model and remain qualitatively unchanged when including relevant control variables. Contrary to our expectations, using examples in the code does not matter (denoted as dPositive/negative examples), the same is true for pictures showing inappropriate behavior (denoted as dNegative pictures) or a foreword by the top management emphasizing not to violate the company’s code (denoted as dBoundary systems). Thus, our second hypothesis is only partly supported, as only some modifications improve code effectiveness.

From a methodological point of view, it is important to note that it is not the case that presenting any code (i.e., irrespectively of its design) has an impact, as some codes do not work. Consequentially, we have to put our first hypothesis in perspective, as it is not the existence of a code but a code’s design, which makes the difference.

Regarding the control variables, we see that there are only two features affecting non-compliance in a significant way. Compared to Germany, the reference category, on average, managers in the Asia–Pacific region intend to behave more unethically and noncompliant. When controlling for the hierarchical level, we see that managers at higher hierarchical levels intended to behave more ethically and more compliant.

As an outlook on the impact of whistle-blowing on unethical behavioral intention which will be discussed below, we also report effects for both whistle-blowing channels. Results show that only in the case of a code highlighting internally administered whistle-blowing, a more ethical intention—compared to the no-code control group—can be observed when control variables are included. This is not the case for a code highlighting externally administered whistle-blowing.

Now, we turn to the hypotheses dealing with whistle-blowing and unethical intent (H3a and H3b). Hypothesis 3a assumed that including whistle-blowing in a code increases ethical intent compared to codes without explicitly mentioning whistle-blowing. Furthermore, Hypothesis 3b posited the superiority of external whistle-blowing over internal whistle-blowing when it comes to ethical intent. A t test comparing the two codes including whistle-blowing and the eight codes without whistle-blowing shows that Hypothesis 3a cannot be supported (t = 0.467, df = 908). Nor can Hypothesis 3b, about the relative strength of external compared to internal whistle-blowing, be supported: as we see in Model 1 of Table 3, both coefficients are of about equal magnitude and there is no significant difference in ethical intention between the group with internal and the group with external whistle-blowing (t = 0.100, df = 183).

Finally, we test our hypotheses on whistle-blowing. H4a postulated that including whistle-blowing in the code increases managers’ willingness to report misconduct. H4b stated that externally administered whistle-blowing is more effective in increasing a managers’ intention to report misconduct than internal whistle-blowing. To test this hypothesis, we regressed the intention to blow the whistle on design elements of codes, compliance training, and control variables (Table 4). We differentiate between the effect of having a code and having a code with whistle-blowing, the latter by comparing the two codes highlighting whistle-blowing with the other eight codes without whistle-blowing as a reference group, to differentiate the effect of having a code from having a code including whistle-blowing.

Table 4 The impact of whistle-blowing and compliance training on the intention to report misconduct

Table 4 shows that even if managers did not receive a code, they are highly willing to report observed misconduct. The intercept of Model 1, which gives the expected value of the dependent variable for the no-code group, is 9.189 on a 2 to 12-point scale. Presenting a code—regardless of its design or dealings with whistle-blowing—increases this willingness significantly, albeit marginally by 0.57 points on the scale.

Are codes highlighting whistle-blowing more effective than other codes? The results, see Model 2, do not conclusively support Hypothesis 4a. Including some form of whistle-blowing does not per se make a code more effective in a statistically significant way. If this were the case, both codes with whistle-blowing should be better than codes without such regulations. Differentiating between internal and external whistle-blowing, we see further that there are no robust differences in effectiveness. While external whistle-blowing has a stronger impact than internal whistle-blowing, this difference is small and diminishes, once control variables are included, an effect which is, to some degree, attributable to cases which drop out due to incomplete information. Thus, Hypothesis 4b cannot be supported.

Looking at background variables, see Models 3 and 4, reveals that the intention to report misconduct varies to some degree by local culture: compared to the reference category Germany, it is higher in Europe outside Germany and the Americas. Note that care must be taken when interpreting the highly significant coefficient from Americas as there were only 17 responses from this region. Willingness to report misconduct is lower participants from R&D, but apart from that, we cannot identify functional domains or hierarchical levels, where the readiness to report misconduct is systematically different from the company average.

Regarding company-based compliance training (Model 3 and 4 in Table 4), we see that managers who attended more training sessions indicate a higher intention to report misconduct. Furthermore, looking at the types of training attended, general vs. specific, we see that having attended a specific training is much more relevant than the general training for managers’ intention to report misconduct. When including relevant control variables, both results remain qualitatively the same. Thus, in total, Hypotheses 5a/5b are partially supported.

6 Discussion and conclusion

Our study addressed the impact of compliance program’s design on corporate ethics and compliance based on a sample of 1005 managers from a European multinational company. Of practical interest was whether the formal design of codes, training, and whistle-blowing matter for compliance and whistle-blowing. Various studies recommend specific formal design elements as a means to improve compliance, but empirical evidence on their effectiveness is lacking. Such evidence would allow companies to improve their compliance programs’ effectiveness by adding design elements of proven relevance. Three questions guided our study: first, are codes per se an effective core element of compliance programs? Second, which design elements of a code have an additional impact on compliance? Third, do types of compliance training and whistle-blowing channels matter for compliance and reporting of misconduct?

We compared default compliance among managers who received no code ahead of participating with the compliance of groups of participants who received one of several codes with specific design features. Code vs. no code comparisons are standard in research on the effects of codes, see, e.g., Marnburg (2000). Prima facie, the presentation of a code increases ethical intent and compliance. But this might be due to the code per se, the ethical reminder argument of Mazar et al. (2008). In line with scientists who emphasize the need to consider a code’s content and design when investigating its effects (Kaptein 2011; Kaptein and Schwartz 2008), we argued that design matters.

If the ethical reminder argument was true, all code variants should work equally well as it is the existence of a code that matters, not their content or design. When considering the role of code design for effectiveness, we find this to be not the case. Some code versions are highly effective; other code versions are no better than no code. The effect we find when considering a pooling of all codes is due to some codes being highly effective.

A first design element increasing effectiveness is a foreword by the top management, which emphasizes compliance in a positive tone. The opposite, a foreword emphasizing the obligation to compliance and threatening with sanctions (boundary systems, see Simons 1995 for this notion), is not effective in increasing ethical intent. Code users are more receptive regarding a code using a foreword emphasizing the importance to adhere to the company’s code and values (similar to beliefs systems, see also Simons 1995). This corroborates earlier findings by Treviño et al. (1999): perceiving compliance programs primarily as a protective mechanism of the top management against organizational members’ non-compliance is negatively correlated with ethical conduct as well as with employee commitment toward compliance.

A second design element increasing code effectiveness is the pictures showing correct behavior. This is interesting insofar, as the complement, pictures showing inappropriate behavior, has no such effect. While the argument was that the dual coding invoked by showing pictures increases effectiveness, one might presume that in the case of the negative pictures, the effect of the pictures is compensated by their negative content, which outlines how actors shall not behave. Practical application requires an inversion of the message. In this regard, a consistent finding from learning theory is that messages, which are formulated in a negative way, are harder to retain and more likely to be mixed up with their opposite.

The third finding that code which sets very clear behavioral limits has a positive impact on compliance fits with the rational choice arguments presented. Potential malefactors who might easily push the envelop in the case of vague rules can no longer do so. In particular, setting a limit for the value of a gift allows for a clear-cut evaluation of actions as appropriate or inappropriate. This finding is in line with a broad body of literature, which suggests the importance to write the code in a clear and unequivocal fashion (e.g., Gibbs 2003; Murphy 1995).

Several other manipulations, all aiming at making the codes clear and unambiguous, were recognized by the participants but did not affect their compliance. For instance, no impact of including descriptive examples of (in)appropriate behavior was found, a feature recommended in the literature (e.g., Barth 2003). This indicates that using examples is a double-edged sword. Examples may work, but only for illustrating the specific case they cover. If the situation is not close to the example, it will not have an effect. However, examples will always be specific examples and a code, like any abstract regulation, cannot provide all possible examples. There seems to be a trade-off between providing catchy examples, which can directly support decision-making, and more abstract examples that increase managers’ awareness and convey the code’s intent, but require a higher level of abstraction and reflection for application.

Overall, these results on design features provide evidence that considering the formal design of a code is a worthwhile option for compliance departments, as some design elements may invest a code with effectiveness it might initially lack.

Beyond the formal design of a code, we investigated an element of a compliance program which focuses on detecting misconduct: whistle-blowing. We found that having a code increases managers’ willingness to report misconduct but were unable to improve this willingness by explicitly highlighting internally or externally administered whistle-blowing. A possible explanation for this result may be that not only the existence or type of whistle-blowing channel matter but also whether it is clear that the observed behavior constitutes a violation of the company’s code, an argument for which there is some empirical evidence (Schwartz 2004).

Another crucial element of compliance programs is compliance training. We did not manipulate training, which is unrealistic in the setting of an experiment, but used variation, which occurred ‘naturally’ among participants. Our results indicate that more compliance training sessions prior to the study lead to higher intentions to report misconduct but did nothing for the managers’ own ethical intent, when relevant control variables were included. This is also the case for more specific compliance training sessions, which are leading to a higher willingness to report misconduct. Comparing these effects of compliance training nicely illustrates a well-known dilemma in empirical studies on norms: training increases awareness about the problem of ethical (mis-)conduct and the scrutiny regarding the behavior of others, but does not really affect the own behavior.

Lastly, regarding background variables like regions, functions, or hierarchy levels, we were able to identify some with relevance for compliance. To some degree, there are regionally defined subcultures in the company, where certain forms of behavior are seen as more appropriate, even if they violate the code (e.g., in Asia–Pacific) or where reporting misconduct is norm (e.g., in Europe except Germany, Americas). Compliance is typically higher in the upper echelons of the company. This is of importance as the top management should set the right tone (Posner and Schmidt 1992), lead by example, and is also in line with the argument by Kaptein and Schwartz (2008, p. 120) that “[a] code could even have a reverse effect when employees perceive no support of management for the code”. In addition, this finding is presumably due to the higher visibility of the upper management levels combined with a highly competitive climate, which implies higher reputational risks and higher scrutiny, also from other managers. The company’s compliance efforts, in particular in terms of compliance training, can take this information and intensify the training for the groups where problems are more pronounced. With respect to personal level variables, such as gender, work experience, or religiosity, we find no impact of on either compliance or whistle-blowing.

With regard to the generalizability of the findings, some limitations need to be acknowledged. First of all, even though the situations used occur in practice, the overall setting of the study was hypothetical (e.g., a fictitious company, no real consequences for violations), which raises the problem of whether managers would behave similar in reality. Very close to this issue is the limitation that the study measured intention instead of behavior. It is exceedingly difficult to capture behavior in the setting of an experiment, let alone an online factorial survey experiment like ours. Capturing behavioral intentions, which are an “immediate antecedent of behavior” (Ajzen 1985, p. 18), was the only feasible option. The objection remains that participants might state certain intentions because they presumed that certain answers were socially more acceptable than others, but would actually behave different in real life. To counter this problem, we controlled for effects of social desirability. The effects found for codes, whistle-blowing, and compliance training are robust and not affected by considerations on social desirability.

A second limitation is that the code participants received was a one-page code of conduct and that they received it shortly before making their decisions: a code was presented, they were given the chance to make themselves familiar with it, and then they had to decide about various dilemmas. Theoretically, this might lead to overestimating the code’s relevance for decisions, as participants were, in a way, primed and reminded to consider ethics in general and the rules in the code in particular. However, it is not the case that all codes work; so the objection that we only observe a priming effect is not conclusive. Given the short exposure of participants to the code one might even argue that the codes’ effectiveness is underestimated here and would be higher, if people were more familiar with it.

A third limitation is that we studied one company, from a certain sector and with a certain compliance policy in place. The company invests a lot of effort into compliance, by having compulsory training in place and generally by emphasizing compliance. In companies from other sectors, compliance policies and the value assigned to compliance may differ, resulting in other absolute levels of compliance. This is a limitation in two regards. First, regarding statements about the level of compliance; given the emphasis of the company on compliance issues, the absolute level of compliance might be higher here than elsewhere. Second, in terms of how participants react to a code, conditional on their personal level of compliance, the strong focus on compliance may imply that participants reacted stronger to the code, but also, that they reacted less strong to the code. In the first case, the level of compliance is already so high that an additional effort (which is, in a way, the code presented in the experiment) will not make much of a difference. In the case that compliance is, despite the efforts of the company, disregarded, the code will not work, and the effect of the code is underestimated. These limitations have to be put into perspective. First, we are not interested in the absolute levels of compliance per se, only in reactions to codes presented as stimuli. Second, even in terms of absolute compliance, the participants are no homogeneous group. There is substantial variation in compliance, but also in the background of the participants. They come from different levels of hierarchy, different functions, and different regions of the world, and these local cultures are reflected in the levels of compliance. The company is quite diverse, active in production as well as services. Thus, there are also several business models/sectors represented. Just as any experimental design, heterogeneous groups do not limit the insights gained from the experiment, on the contrary.

A fourth limitation is that there might be self-selection problems. Participation in the survey was voluntary, and it might be argued that managers who care more about compliance are both, more likely to comply with company regulations and more likely to participate in the study, so the compliance reported might be overestimated. Unfortunately, there is little to be done about it. Further, one might argue that, if managers who are more sensitive to compliance themes participated more often, the estimated effect of codes might be affected, too. This may imply two things: first, the effect of codes is overestimated, because the sample reacts more sensitive to a code. But this argument is not in line with the finding that not all codes matter but only some. Second, the effect of the code is underestimated, because it is of less relevance for a sample of managers who behave correct already. Lacking the data on the degree to which self-selection occurred, there is little definitive to be said about this problem.

Our study contributes to theory and practice of business ethics. First, we address a relevant issue of business ethics as well as management control research: the effectiveness of codes of conduct, whistle-blowing channels, and compliance training as part of management control systems. Especially as existing research indicates the relevance of action as well as people controls for management control system effectiveness (Goebel and Weißenberger 2017), the effectiveness of codes of conduct is a matter of general interest. Second, our paper provides a better understanding how a code’s design and further elements of compliance programs matter for both managers’ ethical intent as well as their willingness to report misconduct, which shed light into the up to now rather equivocal results on the effectiveness of codes. Contrary to suggestions of Mazar et al. (2008), we do not support the notion that a code is merely an ethical reminder, nor that the design or the content of a code does not matter (Adams et al. 2001). As Schwartz (2004, p. 326) put it, “[t]he primary deficiency in the recommendations [e.g., how to design, implement, and enforce codes] that have been made, however, is that they do not appear to be based on any empirical studies”. Our empirical study identifies several design elements which improve managers’ ethical intent. Third, we identify that highlighting whistle-blowing in the code is of limited relevance. Fourth, regarding compliance training, we find that managers’ who attended more different compliance training (approximately 12 months before our study) intend to behave in a more compliant way. This is in particular the case for managers attending more specific compliance training session compared to more general ones. However, this effect is only true for managers’ intention to report misconduct, but not for their own behavioral intentions.

Finally, we also contribute to business practice as we draw up guidelines for those who are responsible for implementing and revising compliance programs (e.g., compliance officers) on how to improve compliance: codes, in particular their design, and compliance training matter, which constitute a means to increase compliance. Albeit of limited impact in terms R2, improvements in the design of codes are a possibility to improve compliance at almost zero costs. Furthermore, our findings identify corporate sub-cultures relevant for unethical intent, which can help companies to identify populations in need of enhanced compliance training.

By design of the factorial survey experiment, design elements of the codes are independent manipulations; so it is possible that the effects found for individual design elements can be added by introducing several of them in the same code, for instance, by including a foreword and pictures. However, further research is necessary to examine interaction effects of a code’s design elements. Thereby, other design elements of codes could also be considered, e.g., the length of a code (Schwartz 2004). In a similar vein, further studies can investigate the effects of combinations of elements of compliance program (e.g., codes, whistle-blowing, incentive and/or accountability policies) and manipulating their design simultaneously. In addition, it is recommended that further research replicates this study by a broader range of ethical dilemmas as situational issues can also influence the results, but which requires an even larger sample. Finally, as our sample is merely related to managers, a further study should replicate the study using employees as well as managers as a compliance program should affect all organizational members.