1 Introduction

1.1 Is There a Crisis in the Social Sciences?

The Guardian, a leading British newspaper, had a long running series on abuses and exploitation in academia titled ‘Academic anonymous’. From this series, we draw a few examples. In one piece (30 June 2017), the work of an unnamed researcher is described. It shows how the researcher tailored their papers not to fit the data, but to make the papers tailor fit for publication in prestigious journals. Another researcher left out any data interpretations that could raise questions with the journal editors. Their supervisor told them it could lead reviewers to turn it down. In yet another example, a research supervisor informed their students that research begins not with asking questions but with selecting suitable, high-ranking journals and defining subjects that might fit into those journals. All these examples illustrate the less than perfect publication behavior currently practiced in some corners of academia.

It was precisely experience with such practices that incited cognitive neuroscientist Chris Chambers to write The Seven Deadly Sins of Psychology (2017), an indictment of the numerous research misbehaviors found in the field of psychology, many of which already have been discussed in previous chapters of this book (see especially Chaps. 6 and 7), ranging from forms of bias and unreliability to fraud or even corruption.

After working for nearly 10 years in the field, Chambers in an op-ed for The Guardian (May 9th 2017), said that he understood that a psychologist’s mission wasn’t about truth seeking, it was about ‘crunching through as many experiments as possible as quickly as possible, finding ways to make ambiguous data look beautiful, publishing frequently in prestigious journals […] winning large public grants, and basically getting as famous and powerful as possible’.

Whether we will call the state of affair psychology finds itself in a ‘crisis,’ as some do, or more of a ‘challenge,’ may be a matter of opinion. Chambers found that not enough had been done to address the forces that fueled the problem (see our discussion of ‘academic capitalism’ in Chap. 8). An appeal to research ethics could be instrumental in transforming the field psychology in helping it to overcome this (perceived) crisis.

Whether this is the case will be the subject of debate in this chapter. We realize, however, that many of the discussions in this chapter are not part of a student’s lived experience. Many of the themes addressed here may seem ‘abstract,’ far removed from the reality of everyday life. To a degree this is true, but to a degree it isn’t, because the pressures and incentives, and political considerations that make up the fabric of modern academic life will reveal themselves in the educational practices in which students are immersed. This is why we have chosen to dedicate one chapter of our book to some of these broader topics.

1.2 System Approach

In this chapter, we will broaden our perspective on research ethics, and move away from considering how or why individual researchers decide to break from established norms and values, or from examining what happens when rigid guidelines meant to steer our ethical behaviors are not followed. Instead, we will look at the interrelated and interdependent parts that make up the whole of the scientific enterprise, of which ethics is just one element. This holistic approach is often referred to as a systems perspective (see Parson 1951; Wiener 1965; Maturena and Varela 1980). By zooming out to a lens that can capture the system as a whole, we can consider how scientific practices and ethics mutually influence one another. This allows us to explore some of the most important internal factors that regulate scientific operations and shape our understanding of how ethics play a role.

We begin with an exploration of what is currently considered one of the more challenging problems in the social sciences, namely the replication crisis. Next, we will research two factors that are often closely connected to said crisis, both believed to have a corroding influence on ethics: publicationpressure and perverse incentives. In the final section, we will focus on the role of teaching research ethics in the future as a means of countering the corroding effects of misconduct.

2 Replication Crisis

2.1 What Is a ‘Replication Crisis’?

The term replication crisiscame into circulation around 2010, when it was observed that often, when studies were reproduced, the same findings could not be replicated. The replicated studies would find a smaller or larger effect than originally claimed, or none at all, or the direction of the effect had changed, or perhaps an entirely different effect was found altogether.

Derived from the natural sciences, the requirement that credible knowledge must consist of reproducible findings implies that studies carried out under the same conditions should generate the same results. Findings that cannot be reproduced are considered chance findings, or false positives, and do not belong in the scientific body of knowledge (Makel 2017).

The requirement of replicability is based on a nomethetic approach, meaning that it should be possible to discover laws that explain objective phenomena. Although not every social scientist accepts the requirement of replicability and its underlying nomothetic notion, those who do are committed to experimentation, clinical trials, and hypothesis testing under controlled conditions. The observation of a replication crisis calls into question the credibility of such research findings.

In fact, many of Daniel Kahneman’s well known social priming studies (as detailed in Thinking, Fast and Slow, 2011) failed in replication, likely because the sample sizes had been too small. ‘I placed too much faith in underpowered studies,’ Nobel prize laurate Kahneman famously admitted (Retraction Watch, February 20th 2017). Kahnemann is but one of many whose studies failed to be replicated (see Hughes 2018, for more examples).

What constitutes the ‘replication crisis’ is a digression from the ideal of reproducibility, resulting in the view that many research findings in certain disciplines (notably psychology, but also medicine) are unreliable, or perhaps even false. To better grasp the issues at hand, we discuss three aspects of the replication crisis: (1) lack of replication studies, (2) weak research rigor, and (3) failing replication.

2.2 Lack of Replication Studies

While replicability is accepted as a ‘fundamental principle’ by a large portion of the social science community, very few replication studies are actually carried out. This was noticed by Smith as early as 1970, while more recently, Makel et al. (2012) found an overall replication rate of psychological research at just over 1%. Additionally, many of these replication studies were performed by the original author. Why is there so little interest in replication studies? Let’s look into two possible reasons.

One rationale is theoreticalsectarianism (favoritism of one’s own theories, and prejudice or discrimination against rival ones), to which many social scientific disciplines are prone. Theoretical sectarianism occurs because researchers operate under specific theoretical assumptions, or paradigms, as they were defined by Thomas Kuhn (1970). Kuhn conceptualized a change in theoretical assumptions as a paradigmshift. The problem being, paradigm shifts aren’t often universal, with different disciplines regularly operating under different paradigms. Furthermore, there is often little common ground to be found between differing paradigms.

Without common ground (shared theories, mutually referenced authors, agreed upon methodologies), researchers are more inclined to confirm findings in their own domain than to disconfirm those of their peers in other domains. This implies that the use of unfamiliar methods make an interdisciplinary researcher performing a replication study to be more vulnerable to the possibility (or accusation) of misunderstanding (see Box 9.1 for an example thereof).

A second rationale may be that journals prefer to publish ‘newsworthy findings’ over what is ‘already known’ (see the section on dissemination bias in Chap. 6 for further discussion). For this same reason, it is more difficult to get funding for replication studies, especially exact replicationstudies (same methodology, same conditions), as opposed to conceptualreplication studies (same conceptualization, different methodology or different conditions).

The net result is that in the absence of adequate replication studies, we are much less sure that positive findings are in fact positive, and not accidental (see Fanelli 2010a for a discussion).

2.3 Weak Research Rigor

The replication crisis may be further worsened by weak research rigor. In 2011, Simmons et al. published a now famous study that shows how easy it is to produce false positives if the researcher’s degree of freedom is manipulated. Degrees of freedom pertain to a series of methodological choices researchers must make over the course of collecting and analyzing data, including the selection of dependent variables, determining sample size, using covariates (independent variables), and reporting subsets of experimental conditions (Simmons et al. 2011).

While staying within the accepted boundary of a.05 p-value, the authors succeeded in ‘proving,’ in a real experimental study with real subjects, the unlikely conclusion that listening to certain types of music makes you feel younger, and the obviously false conclusion that listening to certain other types of music actually makes you younger. They arrived at these conclusions by tacitly manipulating the researcher’s degrees of freedom (such as changing the number of participants without reporting this, or reporting on certain measures only). A reviewer unaware of these manipulations would have to accept the ‘age effect’ (that you actually become younger by listening to certain music) as genuine.

The point the authors sought to make was that as long as reviewers and readers are not informed about the researchers’ choices within their degrees of freedom (which can in and by themselves be legitimate), they cannot reasonably separate false from true findings. They therefore recommend more transparency about the choices made by researchers to avoid the creation of false positives.

2.4 Failing Replication?

Another seemingly significant blow was delivered in 2015, when a number of researchers undertook a large scale attempt to replicate a swath of psychological studies. The result were sobering. At the insistence of Brian Nosek, a group of 270 authors (worldwide) replicated one hundred psychological studies, published in the course of 1 year (2008), in three top ranking journals (Open Science Collaboration 2015).

Using a uniform replication protocol, all studies were carried out as exactreplications as much as possible. The results were collected, assessed, and independently reviewed. Comparing the original studies with the replication studies, the researchers looked at significance and p-values, effect size, subjective assessments, and meta-analyses of effect size. What they found was that collectively, these indicators revealed that the replications produced significantly weaker evidence than the original findings. For example, the mean size effect of the replication studies were roughly half the magnitude of original the mean size effect (see Fig. 9.1).

Fig. 9.1
A scatter graph of replication effect size versus original effect size. It plots significant and nonsignificant p values with replication powers of 0.6, 0.7, 0.8, and 0.9 with a strong positive correlation in the significant values.

Original study effect size versus replication effect size. (© Science, 28 Aug. 2015, issue 6251)

The Open Science Collaboration project caused shockwaves that resonated beyond the scientific community. Newspapers all over the world reported that psychological studies fail to replicate. The authors themselves were much more careful, however. They argued that their study neither proved nor disproved anything. ‘The original studies examined here offered tentative evidence; the replications we conducted offered additional, confirmatory evidence. In some cases, the replications increase confidence in the reliability of the original results; in other cases, the replications suggest that more investigation is needed to establish the validity of the original findings. Scientific progress is a cumulative process of uncertainty reduction that can only succeed if science itself remains the greatest skeptic of its explanatory claims’ (Open Science Collaboration 2015, p. 4716–7).

Maxwell et al. (2015) similarly argued that when replication studies fail to show significant results, this should not lead to the premature conclusion that the original study was somehow faulty or flawed. In a reanalysis of the data collected by the Open Science team, Van Bavel et al. (2016) were able to demonstrate that some of the failure to reproduce the same findings could be attributed to contextual factors, having to do with social and cultural differences between the countries where the studies were carried out, illustrating just how difficult it is to replicate social science studies.

Box 9.1: Irreproducible Tears?

The issue of 14 January 2011 of the journal Science ran a paper by a team of Israeli scholars from the Weizmann Institute of Science at Rehovot, that claimed that human tears my serve a chemosignaling function. After a team of Dutch scholars subsequently failed in their attempt to replicate the experiment described in the original article, a dispute broke out between the two groups of scientists, each accusing the other of misrepresentation. The case, which we describe below in some detail, illustrates the difficulties of replication in the social sciences.

The Israeli team, headed by Naom Sobel, hypothesized that human tears serve a chemosignaling function, such that men smelling or breathing in female tears become sexually less aroused. The idea is that tears contain certain chemicals that may influence brain activity in (heterosexual) men. In order to test this hypothesis, the researchers collected ‘donor tears’ from women who had watched sad films. They then exposed a group of males in a within-subjects design to these tears as well as to a substitute substance. After exposure, the participants were asked to rate their sexual attraction to pictures of female faces. In a second experiment, the researchers added another dependent variable, namely levels of psychophysiological arousal. The various studies revealed that exposure to female tears indeed has an effect, both on self-reported levels of sexual arousal (modest) and on objective psychophysiological expression (more pronounced) (Gelstein et al. 2011).

A group of mainly Dutch psychologists attempted to replicate these findings. One experiment was set up as an exact replication, the other two had alterations. Thus, in one condition, male participants were not only asked to rate sexual attraction, but also whether or not they would be willing to date the females in the pictures. In a second series of experiments, the researchers changed the design from a within-subjects study (which had low power), to a larger, combined within and between subject design. However, they now asked the subjects to not only rate the pictures of a female face, but also of the whole body as a measure of ‘arousal.’

Their attempts to ‘replicate and extend’ the original studies failed. They found no support for the chemosignaling function hypothesis, which they considered a possible false finding (Gračanin et al. 2017a, b, p. 149). The authors proposed instead that tears are functional in a social context, and that crying is a ‘self-soothing strategy’ (Gračanin et al. 2014).

Principal investigator Naom Sobel (2017) of the original Israeli study responded to the failed attempt at replication, arguing that the replication studies were not really replications of the original at all. The researchers had not operated from a proper chemosignaling laboratory, he argued. Further, he stated they had used different test materials (other films), which communicated a different feeling (not sadness), and had used combined datasets in an ‘inappropriate manner.’ Had they used the ‘appropriate techniques,’ Sobel believed they would have found that the data the Dutch team collected actually supported the original hypothesis. This prompted a further response from the Dutch replication team (Gračanin et al. 2017a, b), who argued that a theory that only holds under very specific circumstances is likely not a very good theory.

This dispute is of interest because it leaves open any of the following alternatives, namely that this ‘failure to replicate’ represents:

  1. 1.

    An instance of theoretical sectarianism

  2. 2.

    The social sciences’ context dependency

  3. 3.

    Weak research rigor

Which of these alternatives do you believe is most likely?

3 Publication Pressure

3.1 Value for Money

In Chap. 8, we discussed how in the last quarter of the twentieth century, globalization and neoliberalism set in motion a trend towards ‘valorization’ of science, that emphasizes the monetary value of scientific knowledge. Ties with external parties (both commercial organizations and governmental bodies) strengthened, from which a variety of conflicts emerged. But ‘valorization’ also impacted the way science was organized internally, including publication behavior, to which we now turn our attention.

The wish to demonstrate to the tax paying public that scientists produce ‘value for money’ found its way into an administrative logic that demands adherence to uniform, quantifiable norms. University administrators and policy makers began thinking about ways to measure a researcher’s output in terms of ‘objective numbers’. One objective criterion they soon came up with was a count of articles published by individual researchers in prestigious journals. Later this was extended to a count of citations received. Both could be used as an approximation of a researcher’s objective impact on the scientific community.

These measures resulted in the establishment of new standards, such as a journal’s impact factor, a researcher’s citation impact, and their so-called h-index (see Box 9.2). Any of these standards, with a special focus on publishing success, became a measuring stick for success in academia (Barnard-Brak et al. 2011). Greater output and more citations equaled greater academic value. Unwittingly, these norms transformed the tenured job market in academia into a ‘market for prestige’ (Garvin 1980), thereby creating of whole series of new questions, which we will outline below (Fig. 9.2).

Fig. 9.2
A sketch of a man bending over and writing in a book with his left hand.

Publication pressure

Box 9.2: H-index (Adapted from Wikipedia)

The h-index (named after its inventor, Jorge Hirsch) is an author-level metric that attempts to measure both the productivity and citation impact of a scientist’s publications. The index is based on the scientist’s most cited papers and the number of times those papers were cited in other publications. The index can also be applied to the productivity and impact of a scholarly journal, as well as a group of scientists residing within the same department, university, or even country. The values on the h-index vary greatly from discipline to discipline, where the number of scientists, published papers, and citations strongly differ. In physics, for example, the average h-value for a fulltime professor is between 15 and 20, in economics its 7.6, in sociology its 3.7. In many academic communities, it was long believed that better scholarship equaled higher h-factors, but recent discussion flared up about whether h-values actually represent quality.

3.2 Publish or Perish

Has the pressure to publish in ‘high impact journals’ altered research norms?Moustafa (2014, p. 139) observed that the impact factor became ‘a major detrimental factor of quality, creating huge pressures on authors, editors, stakeholders and funders.’ More tragically, Moustafa notes, is that impact factor has also become the condition for allocating government funding to entire institutions in some countries.

Siegel and Baveye (2010) reason that scholars who wish to meet publication expectations will resort to a variety of techniques to increase their output and crank up their citation ranking. These include the use of co-authorships and so called gift authorships (author does not contribute to the research, or not significantly, but is included out of courtesy and in the expectation of reciprocity). Additionally, salami slicing techniques (slicing research such that several different papers can be written, all slightly varying around the same theme) and extensivereferencing (‘I cite-you, you-cite-me’) are employed to meet publication expectations (Box 9.3).

Fanelli and Larivière (2016) researched why publication rates have increased in the last quarter of the twentieth century. Looking into the work of over 40,000 researches in Western countries, whose profiles were drawn from the Web of Science, they compared the publishing frequency between 1900 and 1998. They found that the average number of papers published throughout the twentieth century remained stable for most disciplines, and then visibly increased after 1980. However, so did the number of co-authors. Fractional productivity (the productivity of one researcher’s work spread out over multiple co-authored papers) remained stable. From this, the authors concluded that the widespread belief that pressure to publish causes the scientific literature to be flooded with salami-sliced, trivial, and false results is in fact incorrect, or at least exaggerated.

In a similar examination, van Wesel (2016) looked at output of journals rather than individual researchers. He examined 50 high impact scientific publication outlets selected from a variety of disciplines (including medicine, physiology, and psychology), to see whether publication behavior changed between 1997 and 2012. Van Wesel compared the number of authors listed, the amount of references included, but also the text and abstract lengths, and even the presence of a colon in the title (as indicators of a paper’s ‘citability’). He found similar patterns as Fanelli and Larivière (2016), including a growing number of co-authors and increased referencing. Aware that these patterns could also be attributed to a change in editorial policies, van Wesel (2016, p. 212-3) believes it is nevertheless ‘not unrealistic to link the observed changes in publication behavior to a change in evaluation criteria.’

While the studies discussed above do not provide evidence that ‘pressure to publish’ leads to fraudulent behavior (Fanelli 2010b), there is reason to believe that publication norms have changed. Authors seem much more strategic in their publication behavior. Due to an awareness of the necessity to publish in order to further their academic careers, it seems likely that researchers plan their publications to meet these goals.

3.3 Intrinsic Versus Extrinsic Motivation

Does increasingly ‘strategic’ and ‘planned’ publishing behavior means that authors are less intrinsically motivated? Hangel and Schmidt-Pfister (2017) interviewed over 90 researchers in Germany who were in different stages of their careers (PhD students, post-doc, tenured professors). The authors asked their participants frankly: ‘Why do you publish?’ They found that the most common reason to publish contained both internal and external motivations, namely: (1) to communicate interesting research results; (2) to gain recognition among peers; (3) because they enjoy writing, and (4) to obtain funding in order to secure future research.

All researchers draw from a mix of these motivations, albeit differently in different stages of their careers. Those still in the early stages of their careers, when they are more dependent on their supervisors, are often highly aware of the incentive that one has to publish for mere survival. They cite motives 2 and 4 as most important. In the next stage of their career, many still feel the pressure to publish, but now more as a means to an end. These researchers seem more capable of enjoying the research process itself and more often cite reasons 1 and 3. Tenured professors almost always cite reason 4, claiming to publish for educational purposes and for academic survival.

Box 9.3: Slicing and Dicing: A Dilemma

A well-respected colleague proudly explains to you that he has managed to produce 12 publications out of the one dataset he collected for his dissertation. This is a particularly interesting achievement, as it involves a dataset with only 232 respondents to a four-page survey.

How do you respond?

  1. 1.

    I think this is a great example to follow and I ask him how he achieved it.

  2. 2.

    I cannot imagine each of these 12 papers having a unique contribution and vow never to go down this route.

  3. 3.

    I tell the colleague that this is bad science and that I strongly disapprove of their actions.

  4. 4.

    I think this is bad practice and is tainting the reputation of science. I inform the editors of at least the most recent of the 12 publications.

(Adapted from the Erasmus Dilemma Game)

4 Perverse Incentives

4.1 Reward Systems

Publication pressurerepresents one of the factors that contribute to what some believe to be a precarious situation in the social sciences. Ambition, external pressure, and weak research rigor seem to highlight an element of ‘crisis’ mentioned earlier in this chapter. Outside of publication pressure, the other factor consists of reward systems that aim to stimulate academic quality and simultaneously deal with decreased public funding.

Acting on a desire to transform universities along neoliberal lines into efficient, productive, and outstanding institutions, new public management administrators (see Box 9.4) set up national reward systems to select the best researches. Performance-based Research Funding Systems (PRFS) have since been used to evaluate the quality of research proposals. They are based on the rationale that ‘funding should flow to the institutions where performance is manifest’ (Herbst 2007, p. 90).

These systems take into account any number of ‘performance indicators,’ such as the output of a research group, their citations received, their international ranking, and the judgements of their peers. For individual researchers, the number of successful grant applications is counted, along with their level of participation in international research associations, the number of keynote addresses they’ve made, board memberships, awards bestowed, and even their perceived societal impact (as expressed, for example, in their role as advisors for social organizations).

Perhaps unsurprisingly, researchers faced with the prospect of losing funding started to seek out ways to increase their performance accordingly. In an attempt to keep pace, or even outdo their colleagues, researchers began spending more time writing research proposals, but since each of these have a limited chance of being honored, the ‘battle for efficiency’ effectively resulted in an even greater waste of public resources.

PRFSs and other performance-based evaluation systems have unintended, so called perverse effects, when they encourage behavior that runs contrary to the original intentions (stimulation of excellence resulting in inefficient use of public funding) or invigorate undesirable behavior (competition for prestige, see Hicks 2012). Bouter (2015, p. 157) paints a grim picture of how certain incentives lead to a ‘monoculture focused on citation scores, short term economic gains, and government-defined growth sectors,’ with young talented researchers not being scouted. In the following sections, we discuss areas where such incentives can have perverse effects (see also Stone 2002).

4.2 Matthew Effect

Sociologist RobertMerton (1968) reasoned how symbolic, as well as material rewards in science will have an accumulative effect. If a researcher ‘scores’ on any one of the criteria mentioned above, their chances of getting funded improve, and this effect will accumulate over time (also in a negative sense: if you don’t score, your chances diminish accordingly). The result is that ‘eminent scientists get disproportionately great credit for their contributions to science while relatively unknown ones tend to get disproportionately little for their occasionally comparable contributions’ (Merton 1988, p. 607). This is called the Matthew Effect, named after the Gospel of Matthew 25:29: ‘For to every one who has will more be given, and he will have abundance; but from him who has not, even what he has will be taken away.’

There is some evidence that indicators of past performance correlate positively with chances of getting a grant application funded. This has been found in the fields of economics and the social sciences in the Netherlands (van den Besselaar and Leydesdorff 2009; Bornmann et al. 2010), although the correlation is low and is not consistent across all disciplines.

More worryingly, since the Matthew Effect contains an element of self-fulfilling prophesy in it, there is a danger of misrecognition attached to it, especially with young academics. When academics are overly assessed in the early stages of their careers, precocious students will have a far better chance of surviving the competitive struggle over their late blooming peers, who may be just as brilliant.

4.3 Gap Between Tenured and Contingent Faculty Members

A further concern is that certain policies will negatively affect the academic labor market (Schwartz 2014). There are strong indications that neoliberal university politics, though meant to reward academic excellence, contribute to a divide between tenured staff with permanent positions and contingent faculty members who work on temporary contracts. This process is referred to as the adjunctificationof higher education, or academia’s overreliance on temporary, non-tenured faculty members (Curnalia and Mermer 2018).

There are a number of reasons for this adjunctification. Increased focus on retention, career outcomes, and resource acquisition brought about a reduction in tenured positions. Once regarded as essential to protect academic freedom in the pursuit of knowledge, today tenured positions making up sometimes as little as 20 to 25% of all faculty staff in Western universities. The majority of instructors and teachers are hired on a contingent basis. These have little prospect of getting tenured but are assessed by and large along the same criteria as tenured staff, while their possibilities for doing research (and getting promoted) are diminishing. Teaching and research are thus increasingly being undertaken by different kinds of faculty (Finkelstein 2014). If nothing changes in the near future, it is to be expected that this divide will only grow deeper (see Dobbie and Robinson 2008) (Fig. 9.3).

Fig. 9.3
A sketch of a woman and a man with grim faces. The woman is standing behind the man.

If nothing changes, the gap will grow

In a survey among some 1500 higher education professors, deans, governing board members, campus administrators, policymakers, and other stakeholders in the United States, Kezar et al. (2015) found general agreement that the present system is untenable. It threatens academic freedom and undervalues teaching through its disproportionate emphasis on research. Most respondents agreed on the necessity to restructure teaching positions. More full-time faculty, differentiation of responsibilities, and an overarching need to restore professionalism to the role of faculty were among the most pressing urges uttered by administrators and professors alike.

Below, we give the floor to two adjunct faculty who experienced this gap between tenured and contingent up close. One, a Dutch scholar in the social sciences, left academia after years of working on a temporary basis. In an email to her colleagues (quoted here with permission), she wrote: ‘I experience a large disconnect between what you are paid to do as a temporary staffer (teaching) and what determines your career options (research). This disconnect means that many contingent faculty members are almost forced to put in a lot of additional hours (i.e., evenings, weekends, using holidays to write papers, taking up parental/care leave to decrease the teaching load and use that time to work). If you do not want to go along with this, your research output will stay ‘behind’ compared to the competition, which makes you less attractive and decreases your chances on permanent positions, promotions, grants, etc.’

Similarly, an adjunct faculty member in England was quoted as saying: ‘I’ve watched brilliant friends be employed for two or three consecutive years with demanding teaching loads, travelling to cities hundreds of miles away with sharing childcare, only to be dropped for someone else with a more illustrious publication record’ (The Guardian, July 14th 2017).

4.4 Gender Gap?

Historically, women have been underrepresented in all disciplines in academia and at all stages in their careers. Many industrialized countries have adopted strong gender equality programs in research and innovation, and the gender gap has since grown smaller (Ceci et al. 2014). Despite this, there is still a ‘pipeline leakage,’ meaning that on the way to the top, women drop out more often than men (Huyer 2015). From the 2015 report She Figures, issued by the European Commission, we learn that in 2011 some 50% of all students in the social sciences and law in the EU, about 30–40% of researchers in these fields, and 29% ‘top level’ researchers therein are women (with considerable differences per country).

Is the demise of the gender gap in higher education (in industrialized countries) an effect of government policies or does it happen independently? Ginther and Kahn (2009) compared the careers of male and female researchers in the US between 1973 and 2001. They examined the probability of obtaining a tenure track after getting a PhD and found that women were still less likely to take tenure positions in science. This was explained by ‘fertility decisions’ rather than reward systems – meaning that ‘women must face a choice between having children or succeeding in their scientific careers, while men do not face the same choices’ (Ginther and Kahn 2009, p. 183).

In non-industrialized countries (for example in parts of Africa), the gender gap persists, and it may also not be ascribed to government policies but rather cultural expectations. Ogbogu (2011), reviewing the gender gap in higher education in Nigeria, notes that recruitment and selection practices in their universities do not discriminate against women. Instead, factors such as lack of mentoring, poor compensation, family responsibilities, ‘and the ideology that women should have low career aspirations’ accounted for the observed disparity in academia.

4.5 Summing Up

In an attempt to keep up with and respond to worldwide changes in the political and economic landscapes within which universities operate, university administrators and government policy makers have set up procedures that aim to enhance productivity and excellence. These policies have had (and continue to have) intended as well as unintended consequences, resulting in a number of fundamental changes in the ways universities conduct research and provide education. These changes themselves pose new questions and challenges that must be addressed (Fig. 9.4).

Fig. 9.4
A poster of a hand holding a pen and a text, that reads, education is not a product, students are not customers, professors are not tools, and facilities are not factories.

© Cartoon by John Stuart Clark

Box 9.4: New Public Management

New Public Management (NPM) is the late twentieth century approach to public service organizations that suggested they be run like businesses. It is based on principles of expanding managerial freedom, flexibility of organizational structures, shifting staff and job conditions, emphasis on output and decrease of input (‘cost-effective management’), and increase of efficiency (see Christopher Hood, ‘A Public Management for All Seasons?’, 1991).

5 Teaching Research Ethics and Integrity

5.1 Ethics and IntegrityEducation

There is little doubt that ethics and integrity education is becoming increasingly important in universities, not least because of increasing demands and competition in the academic field (Brall et al. 2017). Universities have a commitment to prepare, guide, and mentor students through a litany of ethical issues; combatting scientific misconduct, addressing questionable research practices, applying specific procedures and regulations, learning to deal with newfound responsibilities belonging to certain roles, knowing how to accommodate a diversity of perspectives, and learning how to deal with external pressure (Naimi 2007). We concur with Resnik (1998, p. 174) that the question is not whether ‘ethics be taught?’, but ‘how can ethics be taught?’ In the sections below, we briefly discuss three broad approaches for how ethics can be taught.

5.2 Reactive Education

This approach is focused on the prevention of misconduct and misuse of procedures. To accomplish reactive education, many research institutions offer case-based approaches in their curricula (with either real or hypothetical dilemmas), where students learn to make judgement calls through structured discussions (Sponholz 2000). These discussions allow students to ‘evaluate conventions, define responsibilities, articulate positions on different issues, and acquire some facility at using a framework for ethical decision making’ (Stern and Elliott 1997). Canary (2007) shows that these approaches are successful in enhancing moral sensitivity, moral judgements, moral motivation, and moral character in students.

One such development consists of the development of eLearning tools. In a discussion of the issues related to the emergence of computer-based learning, Esposito (2012) finds that open networked learning environments ‘encourage a participatory research approach and therefore foster creative suggestions and shared solutions from participants in an evolving landscape of ethical opportunities and challenges’ (p.323).

5.3 Proactive Education

This approach is focused on preparing students to actively participate in complex research environments and providing skills for adapting to changes in research policies. Proactive education makes use of role playing and simulation settings, which are used to train students to contribute to research themselves.

Sweet (1999) and Karkowski (2010) discuss the function of simulated review procedures with mock research proposals, used to prepare students to produce higher-quality research proposals themselves, with both authors finding these procedures helpful. Löfström (2016) discusses the use of role-playing strategies in academic integrity education, specifically staging panel discussions of realistic cases, which act as added value in facilitating perspective-taking and the broadening of a student’s worldview.

5.4 Reflexive Education

This approach aims at developing ethical awareness not restricted to institutional procedures. Rather working within a broader definition of research ethics that includes social, political, and moral dimensions, Von Unger (2016) emphasizes the need for more critical dialogue in ethics education. As an example, a course format is discussed where sociology students were trained to reflect on a case that had political relevance. The students collected their own data, engaged in critical inquiry, learned to formulate and revise their own assumptions, and thus learned to become more self-critical, a cornerstone of ethical decision-making.

5.5 Should Misconduct Be Criminalized?

A final note on the question of what to do when teaching ethics and integrity fails to achieve its goals. Should misconduct be criminalized? Until very recently, misconduct rarely led to litigation. Even Diederick Stapel, whose case was discussed in Chap. 5, was never brought before a court, though he did lose his job and a large number of his articles were retracted.

A critical issue in deciding whether research misconduct should be subject to criminal law is how it is defined, argue Dal-Ré et al. (2020). Should it only cover well-known forms of fraud, such as plagiarism, fabrication, and falsifying, or should it extend to questionable research practices, such as selective reporting? This question is important, because while criminalization could deter everything that is regarded as research misconduct, it could simultaneously lead to normalization of what is not considered misconduct.

Dal-Ré et al. (2020, p. 9) admit that a research integrity organization with global authority will not emerge any time soon, but they are hopeful that ‘a strong statement that is widely supported can unify and inspire the field.’

5.6 In Sum

All of the approaches discussed above contribute to promoting a stronger understanding of research ethics and integrity in students. We do not want to suggest that one of these is better than another. We merely want to argue that they all fulfill their own role in ethics and integrityeducation, and that universities and educators alike have a never-ending obligation to prepare students the best they can, so that they can prepare the next generation the best they can.

6 Conclusions

6.1 Summary

In this chapter, we examined research ethics in the social sciences from the perspective of a systems approach. We situated universities in the dynamic interplay of political and economic forces, and, more specifically, we discussed the influence of new public management politics on university policies. This led us to probe whether there truly is a research misconduct ‘crisis’ going on in the social sciences.

First, we investigated whether the social sciences suffered from a replication crisis – the problem that the few studies that have even been reproduced often fail to be replicated. Along those lines, we also observed that many studies suffer from weak researchrigor where true findings cannot be distinguished from false positives.

Second, we discussed the impact of perverse incentives on scientific practices. We found that contrary to what is often suggested, these incentives may not incite fraudulent behavior directly. But there is evidence that they link to several other undesirable trends, including a trend towards adjunctificationof higher education, or increased prevalence of educators on temporary assignments, which has an indirect impact on research ethics.

Finally, we discussed three different approaches for teaching research ethics and integrity in universities. These approaches can be used to help students come to grips with ethical questions from a reactive, proactive, and reflexive point of view.

6.2 Discussion

This chapter addresses some of the more fundamental problems in the social sciences, specifically those relating to the political and economic forces at play. These forces have resulted in a ‘crisis’ of sorts, and a drastic restructuring of universities. However, we do not offer political or economic solutions to these issues, which would fall outside the scope of this book.

The second question we did not address is whether teaching ethics and integrity can help solve some of these problems, or could possibly even contribute to them. Some argue that research ethics in the social sciences needs to be modelled after similar, standing practices in the medical sciences, and that governing bodies, such as IRBs, will have to play a more prominent role in upholding professional standards. Others resist this idea based on the objection that a highly professionalized scientific enterprise undermines scientific freedom and creativity (see Resnik 1998, p. 177). There are others still that argue that the formalization of ethical procedures achieves the opposite of what they aim to achieve. Increasingly formalized research ethics structures cause a rupture in the relationship between ‘following rules’ and ‘acting ethically,’ and the result of which is called ethicscreep (Haggerty 2004).

Let us conclude by asking what are your thoughts on this question. Is your institution doing enough, or perhaps even too much, with regard to research ethics? Do you feel prepared to tackle the questions discussed in this book?