Keywords

1 Introduction

1.1 Sloppy Science

In 2012, Anthropologist Mart Bax of Free University Amsterdam was already in retirement when suspicion arose about the validity of his field work, some of which dated back to the 1970s and 80s. It was thought that if he hadn’t fabricated his research outright, then at the very least he had manipulated it. That is to say, he was accused of (among other things) having altered and even removed crucial details in his data. He had ascribed statements to untraceable participants and staged actions that could not be verified.

An integrity commission was set to work and concluded the following year that Bax was guilty of scientific misconduct. He had presented ‘improbable events as “historical facts,” embedded in research that systematically obscures names of persons and places and muzzles sources and contains inaccuracies in a large number of places’ (Baud et al. 2013, p. 39).

The lack of openness and transparency Bax exhibited has become an exemplar of what we now call ‘sloppy science’ – carefree and negligent research practices that include both intended and unintended violations of scientific norms. In this case, the researcher seemed to have placed little value into verifiability and transparency, which sit as cornerstones of appropriate scientific practice.

When sloppy research veers into falsehoods, we speak of ‘falsifying’ or ‘falsification.’ To avoid confusion with the terminology used by philosopher Karl Popper (see below), we stick to ‘falsifying.’ Falsification in Popper’s sense means actively seeking to disconfirm a hypothesis, falsifying effectively amounts to the opposite.

1.2 Falsifying

The term falsifying, literally meaning ‘rendering false,’ entails forms of manipulation that allow researchers to use a dataset that supports biased or even erroneous claims. It includes ‘trimming’ (leaving out certain findings) and ‘massaging’ (slightly changing) data, as well as altering images, misrepresenting results, and simply not reporting findings.

If fabrication (presenting fake or non-existent research data) and plagiarism (literary theft) are sciences’ deadly sins, punishable by severe penalties, then falsifying is its daily sin. It is a less visible, less spectacular form of misconduct, and the scientific community tends to view it more tolerably than its lethal counterparts. Unjustly so, says Köbben (2012), who warns that the accumulation of these smaller sins represents a far greater danger to science than the (isolated) larger ones. Overtime, and left unchecked, this accumulation will result in the large-scale pollution of scientific research.

This being said, some considerations must be first addressed. Not all manipulations represent a researcher’s intent to deceive, nor is every form of manipulation prohibited, as we shall see below. Also, one must distinguish between deliberate manipulations and honest errors. Furthermore, these two should not be confused with scientific disagreement (researchers challenging the conclusions of one another). Thus, although falsifying is considered ‘misconduct,’ it can be difficult to assess exactly when acceptable research practices lapse into dubious ones. It is on this note we enter into the heart of the problem of academic fraud – which is less about demarcating right from wrong, and more about ethical reflection and decision-making. The aim of this chapter is to raise awareness of these issues by exploring several dimensions of falsifying in research practices.

By and large, in this chapter we follow the research process itself. We start with the forms of bias that appear at the first stage of the process, when research questions are posed. Following the selection of questions, a discussion will take place regarding the falsities that result from (slight) alterations, or the act of manipulation during data collection and analysis.

We finish with a discussion of the biases often present when research conclusions are reported and disseminated in a skewed or one-sided way. Though this is referred to as publicationethics, it relates to our subject of research ethics because it discloses disturbances in the research process.

In two separate sections, we discuss the problem of self-deception (falsifying by not being critical enough) and a possible remedy against the falsehoods inherent to science.

2 Bias at the Start of the Research Process: Asking Critical Questions

2.1 Confirmation

Research starts with asking questions. But, as any student knows, asking good questions demands a self-critical attitude, and a readiness to address and counter one’s own preconceptions. In fact, scientists should actively look for information that disconfirms their opinions about the world, an action for which the term ‘falsification’ is reserved (as explained above).

The reality is, this is much more difficult than it appears. There is a long-identified experimental effect known as confirmationbias (sometimes called mysidebias, see; Perkins 1985; Toplak and Stanovich 2003). Confirmation bias consists of the tendency of individuals to judge new information in a way consistent with their preexisting ideas or convictions. People thus prefer supporting information rather than conflicting information and tend to overlook or disregard information that does not fit into their worldview (Jonas et al. 2001).

Notorious examples are found in ‘psychic studies’ (studies into paranormal activities) and psychoanalysis (studies into the unconscious mind). In both traditions, there exists a strong tendency toward confirming what was theoretically hypothesized. But it is far from restricted to just these domains and has been observed in more empirically oriented research traditions a well.

As a case in point, Greenwald et al. (1986) examined empirical research into a phenomenon known as the sleeper effect. This is the counter-intuitive finding that a persuasive message accompanied by a ‘discounting cue’ (a prompt that indicates the message is untrustworthy) tends to develop more impact over time. For example, viewers watching a ‘smear campaign’ against one political candidate, paid for by the opposite candidate, will develop a more favorable attitude towards the message weeks after being exposed to the message, rather than immediately afterwards, despite being aware that the source is biased.

To explain this effect, it was hypothesized that over time, the discounting cue becomes dissociated from the original message, and therefore ceases to be effective in countering it (this is known as the dissociation hypothesis). Research into the sleeper effect has not been able to confirm this hypothesis, however that did not deter the researchers from investigating it. Only much later did researchers realize that the ‘dissociation hypothesis’ was incorrect, and that an entirely different explanation was required. This fixation on a single hypothesis, and the resulting neglect of alternative theories has obstructed scientific understanding on the subject for some 25 years, Greenwald et al. observed.

2.2 Challenging Bias

Effectively, confirmation bias undermines open and critical thinking and runs against creativity. This poses a serious challenge for science. Open-mindedness and creativity are two of science’s most crucial features. How can scientists avoid or counter this type of bias? Is it possible, for example, to train scientists to consider both sides of an argument? There is evidence that this may be possible, at least to a degree.

Wolfe and Britt (2008) found that when students were assigned to one side of a (somewhat controversial) topic and received instructions to search for as much information on the topic as they saw fit, they would display confirmation bias. However, when instructed to search specifically for balanced information, confirmation bias was significantly reduced. Similarly, Macpherson and Stanovich (2007) found that decontextualization instructions (instructions to put aside one’s own convictions and consider the issue from opposite sides) helped reduce confirmation bias. These findings, preliminary as they are, point to the importance of making explicit one’s expectations (Box 6.1).

Box 6.1: ‘Disconfirmation Dilemma’

How are we to deal with disconfirmation in the empirical process? If the results of an experiment don’t confirm theoretical expectations, researchers are confronted with what Greenwald et al. call a ‘disconfirmation dilemma.’ Researchers can decide to either (a) reanalyze the data, (b) revise the procedures, or (c) reformulate a different prediction (based on the same theory). Rarely do researchers decide for option (d), to publish disconfirming results.

When researchers resolve the disconfirmation dilemma by repeatedly retesting predictions, instead of reporting disconfirmation, they may be accused of some form of ‘falsifying’ because they resort to theory-confirmation rather than theory testing (Fig. 6.1).

Fig. 6.1
A flowchart from developing theory to reporting results on confirmation, through deriving prediction, setting up procedures, and test prediction. Disconfirmation dilemmas cause revisiting the steps, and abandonment of problem if finding another prediction fails.

Disconfirmation Dilemma. (Adopted from Greenwald et al. 1986, p. 220. The dotted line represents a route infrequently taken)

3 Bending the Empirical Cycle: Manipulations During Research

3.1 ‘Lies, Damned Lies, and Statistics’

The next step in the research process consists of setting up a design in order to test hypotheses. In the social sciences, significance testing of null hypotheses is an omnipresent tool, and will be the focus of the next three sections.

The simplest example of significance testing is the evaluation of null hypothesis μE = μC against the alternative hypothesis μE ≠ μC. Here μE denotes the mean of the outcome variable in an experimental group, and μC the mean of the outcome variable in the control group. If the p-value for testing the two hypotheses against each other is smaller than.05, the null hypothesis is rejected. If it is larger than.05, then it means there is no significant difference between the two groups, and the null hypothesis is accepted on account that no evidence of an experimental effect was found.

Scientific journals have long tended to only publish results that have shown the experimental condition to be ‘effective,’ that is, if the p-value is smaller than.05. It is therefore crucial for researchers to obtain low p-values, otherwise their effort, time, and money is wasted. For some, obtaining small p-values has become a goal in itself. This raises some ethical questions which will be explored below.

3.2 Questionable Research Practices

Can research outcomes be manipulated such that lower p-values are obtained? The answer is yes, for example, by removing so-called ‘outliers.’ Outliers are extreme scores, and removing them heightens the chance of getting significant results. It sounds like cheating but that need not be the case, there can be good reason to remove outliers. Outliers can result from data errors (incorrectly recorded data) or because respondents may have failed to understand their role, or the questions asked. For instance, one survey gathered data on nurses’ hourly wages. While on average respondents reported to earn $12.00 an hour, with a standard deviation of $2.00, one nurse reported an hourly wage of $42,000.00, which was clearly erroneous (it was more likely their annual income). Not removing this number would influence the true outcome (see Osborne and Overbay 2004).

However, consider the case of Dirk Smeesters, professor of consumer behavior and society at the Rotterdam School of Management. His work attracted the attention of fellow researcher Uri Simonsohn from Wharton University in 2011. He had read some of Smeesters’ work and suspected foul play. Simonsohn believed Smeesters’ studies were ‘too clean to be the result of random sampling’ (quoted in Chamber 2017, p. 81). He requested and obtained Smeesters’ dataset and discovered anomalies. It seemed that Smeesters had removed participants from his data when they led his hypotheses toward not being confirmed. Smeesters responded that the participants ‘had not understood the instructions’ (quoted in Kolfschooten 2012, p. 270).

This did not satisfy Simonsohn. An integrity commission investigated the case and ruled that this reversal of logic, by which outliers are removed to boost significance, should be understood as ‘data massaging.’ Smeesters confirmed that he had acted ‘erroneously’ but denied that he had committed fraud: ‘What I have done was to give a study, which was already almost good, a push in the right direction’ (Kolfschooten 2012, p. 270). That didn’t help his case. Seven papers he co-authored were retracted and Smeesters resigned from his position in 2012.

What Smeesters engaged in are called questionable research practices, or QRPs for short. QRPs have become serious concerns in the academic community. Simmons (2011) noted that ‘flexibility in data collection and analysis allows researchers to present almost anything as significant.’

QRPs take many shapes and forms. To name a few; failing to report all dependent measures, selective reporting (only submitting studies that were successful), and excluding data after looking at the impact (as Smeesters had done).

Evidence of QRPs on a large-scale were found by Masicampo and Lalande (2012). They collected the reported p-values from three high-level psychological journals and compared their distribution. Given that smaller p-values are more appreciated, one would expect to see a steady decline in reports with larger p-values. What they found instead was a steady decline, followed by a peculiar peak of p-values just below.05 (see Fig. 6.2).

Fig. 6.2
A histogram of frequency versus p-value with a decreasing trend. The highest point is at (0.01, 320). The lowest point is at (0.095, 30). Values are estimated.

‘A peculiar prevalence of p-values just below.05’. Figure by Larry Wasserman, based on the data collected by Masicampo and Lalande (2012). Used with permission from the author. (Source: Normal Deviate, entry August 16, 2012)

Many take this as evidence for the existence of ‘falsifying,’ because it appears that researchers have manipulated their data to ensure their data falls within an acceptable p-value of below.05.

Further qualitative evidence of QRPs on a large-scale was found by Leslie John and his collaborators. John et al. (2012) surveyed over two thousand psychologists and found a majority of psychologists admitted to engaging in a variety of such behaviors. In their widely circulated article, it was estimated that some questionable research practices are so widespread that it must be assumed that virtually everyone uses them.

This raises the question as to whether these practices constitute a new scientific norm (John et al. 2012). Discussing this controversial conclusion, Fiedler and Schwarz (2016) warn against an inflation in the usage of the term QRPs precisely because of the suggestion of normalization. Some of the reported practices, they argue, are merely ambiguous, not ‘questionable,’ while others may or may not be justifiable depending on the specifics of the case (p. 50).

We do not propose that QRPs are the ‘new norm.’ On the contrary, there is a serious danger that the scientific literature becomes polluted with ‘breakthroughs’ (significant findings) which are not breakthroughs at all. Indeed, in the June 1st, 2011 issue of Scientific American, John Ioannidis argues that exaggerated results in peer-reviewed scientific studies have reached ‘epidemic proportions’ in recent years (Box 6.2).

Box 6.2: ‘P-hacking and HARKing’

The following case, discussed in several entries on the critical website Retraction Watch in 2017, provides a rare glimpse into how the academic community responds to questionable research practices.

Several years ago Brian Wansink, a world-renowned food researcher at Cornell University, provided a visiting PhD student with the complete set of data of a self-funded study that failed to produce any notable results. He told the student that it is was well worth the effort to search for overlooked patterns, further stating that ‘there’s got to be something here we can salvage because it’s a cool (rich & unique) data set.’ The student set to work and managed to produce five articles in just six months using the dataset. In a now deleted blog post of November 2016 (‘The Grad Student Who Never Said “No”’), published on his personal website, Wansink proudly reported on this student’s success, presenting it as a ‘lesson in productivity’.

His readers were less impressed. ‘This is a great piece that perfectly sums up the perverse incentives that create bad science. I’d eat my hat if any of those findings could be reproduced in preregistered replication studies.’ Another reader commented, saying that what was described in the blog sounded suspiciously like p-hacking and HARKing (entry at Retraction Watch, 2.2.2017).

P-hacking (also called ‘phishing’) is term used to describe how researchers try to uncover statistically significant patterns in a data set without having a specific hypothesis. They just hope to find statistically significant results. HARKing is the flipside of this coin (HARK stands for Hypothesizing After Results are Known). It consists of presenting a post hoc hypothesis in a research report as if it were, in fact, an ‘a priori’ (earlier formulated) hypothesis.

Had Wansink been ‘bending the rules of the game’ by letting his student go through raw data in the hopes of unearthing something (anything), which then would be presented as a ‘finding’?

When confronted with the accusation of p-hacking, Wansink retorted that testing the null hypothesis had been his ‘plan A.’ It was when he didn’t find anything that he turned to ‘plan B.’ As Wansink explained: ‘P-hacking shouldn’t be confused with deep data dives – with figuring out why our results don’t look as perfect as we want. With field studies, hypotheses usually don’t “come out” on the first data run. But instead of dropping the study, a person contributes more to science by figuring out when the hypo worked and when it didn’t. This is Plan B′ (quoted in an entry on Retraction Watch, 2.2.2017).

Wansink’s rebuttal failed to convince readers of Retraction Watch. One wrote: ‘Deep dives are great, but they should be planned when the study is being constructed, not created after the fact in an attempt to “salvage” something from the experience’ (2.2.2017). Another sarcastically remarked: ‘Wansink’s use of the phrases “our results don’t look as perfect as we want” […] pretty much speaks for itself’ (3.2.2017).

However, not everyone saw wrongdoing. Another reader wrote in Wansink’s defense, exclaiming that not all exploratory studies constitute ‘p-hacking’: ‘[There] is nothing wrong with a researcher honestly engaging in and presenting an exploratory analysis without a single pre-defined hypothesis. As long as these studies are presented honestly, they can provide useful insights and generate useful hypotheses that can later be verified (or debunked) through attempts to replicate, often by other researchers in the field’ (19.2.2017).

The result of Wansink’s actions? – By the summer of 2019, when this chapter was written, 17 of his papers were retracted (one even twice) and he resigned from his position. (See Resnick and Bellus 2018, for further discussion.) (Fig. 6.3)

Fig. 6.3
A circular flow model, with threats such as publication bias, failure to control bias, low statistical power, poor quality control, and p hacking, in the stages of publish, generate hypothesis, design, collect, and analyze data.

Threats to reproducible science. (Adapted from Munafò et al. 2017)

3.3 Image Manipulation

Recent technological advances provide researchers with a wealth of opportunities for furthering their research in ways not available fifteen or twenty years ago. But as with any change, new ethical considerations emerge. In the case of digital images, there is a growing concern in the scientific community over how to properly handle them.

Today, there are multiple known cases of unethical manipulation of images that affected the interpretation of the data presented, a number of them having led to retractions. One such case, a 2009 paper published in the Journal of Biological Chemistry by Spanish researcher José G. Castaño, was retracted because (quoting from the notice published September 9th 2016 on Retraction Watch) ‘the same image was used to represent results of different experimental conditions’ on multiple occasions, adding further that the ‘background of one image had inappropriately been adjusted.’

A lack of awareness of what is considered an acceptable form of image manipulation calls for the creation of guidelines to help researchers distinguish between appropriate and inappropriate use of digital images. A set of such general principles is discussed by Cromey (2010), who compares these guidelines to what is already established practice in the field of photojournalism. A sampling of these guiding principles includes the following recommendations:

  • Digital images should be acquired in a manner that does not intend to deceive the viewer or to obscure important information.

  • Manipulation of digital images should be performed only on a copy of the image.

  • Simple adjustments and cropping are acceptable but lossy (irreversible) compression should be avoided, and use of software filters to improve image quality is not recommended.

  • Cloning or copying objects into a digital image is considered highly questionable (Box 6.3).

Box 6.3: ‘Consequences of Retraction’

With an increased awareness of research ethics in our day and age comes an increased awareness of the consequences of misconduct. We quote from an anonymous cry for help, posted on October 16th, 2013 on ‘Editage Insights’ (a platform for researchers, authors, publishers and academic societies): ‘I recently got an email from the editor of a journal in which my paper is published, requesting me to retract the paper because they found some errors in my data and statistical analysis. I am worried about my reputation if I have a retracted paper. I may not get a grant for my next study. Please advise me.’

In a response posted on March 30th, 2017: ‘I would encourage you to respond positively to the journal editor’s request and offer to have your paper retracted. If you do so, the journal’s retraction notice will inform readers that the paper has been retracted by agreement among the authors and the journal editor, owing to errors in data.’ (source: Editage Q&A).

4 Bias in Disseminating Research: Publication Bias

4.1 File-Drawer Problem and False Positives

Imagine a researcher testing the effects of the new and promising treatment ‘X’ (say a particular form of cognitive behavioral therapy for a certain type of anxiety disorder). To the researcher’s disappointment, a comparison between the experimental group who received this form of therapy and a control group who received no therapy (or a different therapy), resulted in a p-value of.14. Since this is larger than.05, the hypothesis is rejected, and the results are not published. Another researcher (unaware of the first researcher’s work because it was not published) is interested in the same therapy. Their comparison results in a p-value of.37, and again the results are not published. Shortly thereafter, a third researcher evaluates treatment ‘X’. They find a p-value of.02, and as a result the findings are published.

Based on this one publication by researcher No. 3, the unsuspecting reader will conclude there is evidence for the effectiveness of treatment ‘X’. In reality, the effectiveness of treatment ‘X’ is a false positive because there is actually more evidence to the contrary – that it doesn’t work. The problem being, that evidence was never published. Our full understanding is obscured by the fact that studies that show no result are not published. Robert Rosenthal coined a term for this; the file-drawer problem.

‘False positives’ and the complementary ‘file-drawer problem’ relate not so much to theoretical or methodological issues in research, but to questions regarding dissemination (communication or non-communication of research findings), related to publicationethics.

If there are frequent false positives and/or numerous unpublished null results, any meta-analysis of a particular research subject will turn up corrupted. How serious is this problem?

One way of researching this question is by comparing the amount of research undertaken with the number of publications stemming from that research. In many fields of research, pre-registration is required (the researcher must catalogue his research protocol in advance, submitting hypotheses, methodology, and expected findings). This makes it easier to check for both p-hacking and HARKing, but it also allows for the questioning of submissionbias (the tendency to only submit for publication studies that have ‘positive findings’). In the social sciences though, pre-registration is a recent phenomenon and not the norm, with a few exceptions.

One exception is the public registry TESS (Time-Sharing Experiments in the Social Science). Franco et al. (2014) followed studies registered in TESS over a ten-year period, to see how many of them were eventually published in peer-reviewed journals. It turned out that 80% of the registered studies were written up, but less than half (48%) were published. Unsurprisingly, there proved to be a strong relationship between the outcome of the study (whether or not the hypothesis was supported by the results) and it being published. Studies that had negative results were far less likely to be published, and even less likely to be written up at all.

Why do researchers opt to not write up ‘null results’? Franco et al. (2014) questioned a selective group of researchers by email and got answers that confirmed their suspicion. As one of their respondents reported: ‘I think this is an interesting null finding, but given the discipline’s strong preference for p < 0.05, I haven’t moved forward with it.’ (p. 1504).

4.2 Reviewer and Editorial Bias

Another source of publication bias is located in the process of peer review and journal editorship. Both peer reviewers and journal editors effectively function as gatekeepers, deciding whether an article is worthy of publication. We discuss both roles below, starting with peer reviewers.

The ‘peers’ in the peer review process are researchers themselves, often experts in a field from which they are recruited. They are asked to assess the quality of manuscripts sent to a journal. The review procedure in which they participate is as a rule blind. That is to say, the author is not aware of the reviewer’s identity (single blind), but often the reviewer doesn’t know the identity of the author either (double blind). Peer reviewers do not get paid for their work, should have no interests involved, be unconnected to the authors, and act solely on the desire to guarantee objectivity and impartiality in science.

But does it really work like that? Some argue that peer review is indeed the best system that we have, providing impartial quality control. Others contend it may have functioned as such at one point but it longer does in today’s society, where science cannot permit the luxury of operating from an ivory tower any longer (we turn to this discussion in greater detail in Chap. 9). And then there are those who argue that the peer review system has never guaranteed quality control in the first place.

They point to the fact that some of the most important and groundbreaking works in the history of science were never peer reviewed, that some of these work were initially rejected by peer reviewers, and that vice versa, flawed, non-sensical or even absurd papers were accepted by them (see Box 5.7 on ‘hoaxing’).

If we cannot rely on peer reviewers to detect errors, identify misconduct, or spot what is truly innovative, then the peer review system does not safeguard quality. But perhaps it is even worse. There is reason to believe that the peer review system is biased in at least two ways.

  1. 1.

    Peer reviewers may be too conservative. Reviewers are believed to be biased against new findings and new ideas. Also, they focus too much on finding weaknesses in manuscripts and not on the positive contributions therein. Suls and Martin (2009, p. 43) argue this may be so because ‘appearing to be too lenient seems worse that appearing to be too harsh.’

  2. 2.

    Reviewers seem prejudiced in favor of prestigious research institutions and established authors. Peters and Ceci (1982) found confirmation of this suspicion in a small-scale study they performed. They selected 12 previously published articles, originally written by researchers from prestigious American psychology departments, and resubmitted them to the same (highly prestigious) journals that had previously published them, but under fictitious names and fictitious institutions. Only three articles were detected as ‘resubmissions’; eight out of the nine remaining articles were rejected on ground of insufficient quality (some were even critiqued for having ‘serious methodological flaws’).

Consider next the role of journal editors. As gatekeepers, editors not only have to safeguard quality, but also to present interesting, original, and novel findings to their readership. This may lead to a bias against replication studies because they don’t offer anything new, despite replication being the ‘gold standard’ in science (see Møller and Jennions 2001). Kerr, and later Rowney and Zenisek (quoted in Hubbard and Armstrong 1994) conducted a survey among editors and review board members of both management and psychology journals, and indeed found confirmation of editorialbias against replication studies.

The editor’s obligation to present novel and interesting results can furthermore lead to an effect known as the proteus phenomenon. In essence, whenever positive results are published, a window of opportunity quickly opens for researchers to publish findings that contradict these results. Editors often publish these findings because they too are a ‘novelty.’ The net effect is a tendency for journals to rapidly publish conflicting results (see Pfeiffer et al. 2011) (Box 6.4).

Box 6.4: ‘Adjusting the Data? A Dilemma’

On the ‘r/AskAcademia Reddit community, a student identified as ‘Throwinbin’ (henceforth ‘T’) published a telling post. T referenced a supervisor who requested that they make use of a dataset in which a specific approach was implied. The problem was, the data (provided by a third party) did not fit the format that the approach required. In T’s words: ‘Using [my supervisor’s] method will involve removing whole articles from our data set and changing the (important, central) main attribute of the set. It’s basically massaging the data until it fits the model of his method. I’m not comfortable doing this as I strongly believe that it’s going to give us false results […]. How to raise this with [my supervisor] without sounding really bad?’

Below are the (edited) exchanges between T and three community members who responded to the post. From these exchanges, it becomes clear that ‘data massaging’ is not the only issue at stake, and a number of other ethical dimensions are in play. Consider how T managed the situation:

  • Respondent A: ‘Is there a person in your department you could consult (with no stake in your publications)?’

  • T: ‘I have a “second supervisor,” but I’m not keen to take this concern elsewhere in the department at the moment. I like my supervisor and I don’t want to harm his career.’

  • Respondent B: ‘If you do want to approach your supervisor, then ask how to organize the data to make them fit (rather than the approach of saying it is wrong), and maybe he will explain something you hadn’t thought about.’

  • T: ‘I’ve done this – which is how I now have in emails him telling me to remove certain rows from the data, and later reorganize whole columns without making sure the changes carry over the entire dataset. I’ve made the changes he asked for and ran his method and it looks pants [not good].’

  • Respondent C: ‘Could you not go down the ‘play dumb’ route? – “I’m confused, maybe I’m just stupid, I’m not sure how your [method] is entirely relevant. Can you explain how it’s better than x”?’

  • T: ‘I do understand his need for this method well; he wants to move on from our current institution and this will look good on a CV. I sympathize with him as I’m not very happy with the situation in our department either and would be looking to move on if I could. I’ve admitted defeat and made our data work his model. Results so far are rubbish, so I’m going to take it to him and put the ball in his court – though if he insists on it going into a paper I don’t want my name anywhere near it.’

  • You: What advice would you give ‘T’?

5 Self-Deception

5.1 Mistakes Happen

Earlier we noted that various forms of falsifying must not be confused with ‘honest mistakes.’ But there is one type of ‘honest mistake’ that should be considered a form of falsifying, even if the researcher had no intention to mislead. This is self-deception. Self-deception occurs when the researcher is so strongly convinced that a particular model or theory is correct that they are unable to accept evidence to the contrary. We will discuss a few forms of self-deception from here.

Perhaps the strongest form of self-deception consists of discovering information that doesn’t exist, a phenomenon ironically dubbed pathological science. The discovery of so called ‘N-rays’ (an alternative to X-rays) by French physicist René-Prosper Blondlot in 1903 counts as one such example (Grant 2008, pp. 88–89). Blondlot built a device that enabled him to ‘see’ these rays. He gave demonstrations and others, if trained properly (or told what to look for), would see them too. The non-existence of N-ray was exposed when a sceptic secretly turned off the device and Blondlot still claimed to ‘see’ the rays.

The biography of Wilhelm Reich, a former Freudian who had gone astray (Sharaf 1983) offers a similar story. Reich was convinced he had discovered a new form of energy, which he called ‘orgone.’ He built a device in which orgone would accumulate, and while testing his device, he found a constant temperature difference of 2 °C inside the ‘orgone accumulator.’ Believing this to prove the validity of his discovery beyond a reasonable doubt, he contacted Einstein, who kindly agreed to study his device. Two weeks later, Reich received word from Einstein, who stated that his assistant had come up with a simpler explanation for the temperature difference – lack of air circulation. Reich was unswayed and maintained faith in his ‘discovery.’

In the field of parapsychology, Alfred Russell Wallace, a British naturalist and the co-discoverer of natural selection, offers another interesting example. During his investigations, Wallace accepted certain observations as evidence for the existence of extra-perceptual phenomena. In his autobiography, he wrote about his conversion to ‘Spiritism’ after having attended a series of séances with a ‘medium’: ‘I was so thorough and confirmed a materialist that I could not at that time find a place in my mind for the conception of spiritual existence, or for any other agencies in the universe than matter and force. Facts, however, are stubborn things. […] Facts became more and more assured, more and more varied, more and more removed from anything that modern science taught, or modern philosophy speculated on. The facts beat me’ (quoted in Shemer 2002, p. 192).

The irony of ‘facts’ having ‘beaten’ Wallace is probably not lost on the reader, for the séances were in reality very likely carefully orchestrated performances of frauds. Wallace himself, however, was not a fraud – he was taken in by the performance (Fig. 6.4).

Fig. 6.4
A sketch of a man lying down and sleeping. The moon is in the background.

Carried away by self-deception

5.2 To Remember or Not to Remember (That Is the Question)

The above discussed cases of self-deception may be comical illustration of how scientists were able to fool themselves in the past. Yet the question arises whether we can be sure that some of our own present day discoveries are not also instances of self-deception.

In the last two decades of the twentieth century, a debate emerged within psychology over whether or not a repressed childhood memory of sexual assault could be recovered through the aid of specific forms of therapy (Pezdek and Banks 1996). Proponents of recovered memory therapy argue that children who go through such assaults ‘dissociate’; meaning they repress all memories of such traumatic experiences and will not remember them, unless aided in some way.

Recovered memory therapy became prominent in the 1990s. Certain recipients of the therapy recalled highly bizarre satanic-abuse memories. In some cases, these testimonies led to prison sentences for men accused of these crimes. However, it turned out that at least some of these accusations were false and the ‘perpetrators’ were released. This prompted critics to question the validity of recalled memories (for a full discussion see Loftus and Ketcham 1994).

Did the ‘recovered memory movement’ find in the testimonies of their clients what they wanted to hear, or had they unearthed a new phenomenon which mainstream science refused to accept because it was too controversial? (Box 6.5)

Box 6.5: ‘Not Sure If It’s Research Misconduct’

A PhD student turned February 2020 to ‘r/AskAcademia’ discussion platform on the website Reddit, for advice, writing how that it seemed as if a professor in had been engaged in research misconduct, and considered taking it to the board.

When I opened the file [of my professor], I noticed that a lot of the figures have been altered. In several cases he took a bar graph (looks like a screenshot from a Prism file) and then covered one of the bars with a different bar. The new bar would have a different height and number of significance asterisks than the original one. I could move over the replacement bar and see the original figure – the replacement was clearly cropped out of a different figure and pasted onto this one. […] I talked to a few other students about this. They think it’s possible he’s just lazy or doesn’t know how to use Prism. Like maybe his students repeated an experiment and that changed the results, but he didn’t want to (or know how to) update the figure in Prism so he just pasted the new bar on top? This seems sketchy to me, and I don’t think it explains the difference from his published figure either. […] I’m hesitant to ask him directly. If he actually isfalsifyingdata, it’s not like he’s going to admit it to me. I’d prefer to speak with the department chair and see what she recommends. However my classmates think going over his head without first asking for an explanation would be wrong. I’m really at a loss for what to do.

Here are some of the replies this PhD student received:

  1. 1.

    I think it’s quite weird that you are worried that it may reflect badly on you if you ask him directly. Yet, you think that the more drastic approach of going to the chair is less worrying.

  2. 2.

    You don’t have evidence of misconduct and there are only downsides to yourself from making accusations. I’d say forget about it.

  3. 3.

    I think a good and non-accusatory way to go about it is (if initially via email): ‘Hi X, I’ve noticed that there are revisions to the graphs in the PowerPoint. Did you happen to obtain more evidence/data changing the original graphs and supporting your conclusions? If so, what steps or projects are you pursuing after the new information?’

Which of these advices do you prefer? Or would you consider a different approach?

Source: Reddit, AskAcademia, ‘Not Sure if What I’m Seeing is Research Misconduct?’

6 Science’s Self-Correction

6.1 Self-Correction

Discoveries of falsehoods in research are traditionally met with a defensive system of self-correction known as retractions. This quite simply means that a ‘contaminated’ publication is flagged but not withdrawn from the public domain. A note is attached to the paper that states it has been ‘retracted.’ Retraction can take place with or without the author’s consent and can be argued on the grounds of methodological or theoretical flaws, or because research misconduct was identified.

Retractions are not to be taken lightly. Until quite recently, misconduct and subsequent retraction of a publication remained an internal matter, known to only a few parties. However, with the rise of digital publications, retractions have become much more public (and visible). For example, the website Retraction Watch is dedicated exclusively to highlighting misconduct, fraud, and retractions in science (across all disciplines). It keeps track of virtually all that is going on in the academic world in a very public manner, posting the full names and affiliations of all parties involved. A retracted article, though ‘withdrawn,’ not only remains visible, it effectively becomes a permanent stain on an author’s reputation (see Box 6.3 for an impression of this consequence).

Apart from the personal consequences, there remains the question of what damage fraudulent articles can cause. After all, an undetected (unflagged) fraudulent paper will remain in the public domain, continuing to act as a source of pollution in future literature. This is important to consider, as a great deal of time may pass before a fraudulent study is retracted. Interestingly, in the last few decades, the retraction process has quickened, with the number of retracted papers increasing in lock step. From this, three questions can be raised: (1) Is this increase a good thing or not, and how to account for it?; (2) Are retractions the right answer to the problem of QRPs?; and (3) Are there better alternatives? We will very briefly touch on these issues in the sections to come.

6.2 Beyond Retraction?

In an often-cited article, Daniele Fanelli (2013) investigates retractions in scientific literature. Scouring through data from the Web of Science (a publisher-independent global citation database) for the entire twentieth century, Fanelli notes a sudden increase in retracted papers per year since the 1980s of some 20%. He then compared this increase to the number of ‘corrections’ applied to articles in the same period, which did not see a similar increase.

Fanelli proposed two hypotheses, that both have a radically different outlook on the question of whether or not the increase in retractions signifies a positive development. One attaches the increase to growing misconduct within the academic community, and thus sees the growing number of retractions as a bad sign. The other states that the system has become more resilient, and thus the increased number of retractions signifies something good.

Fanelli argues that the evidence in his study suggests that the ‘stronger system hypothesis’ is more likely to explain the rise in retractions than the hypothesis that scientists have become more fraudulent. Peers, editors, and the scientific community at large seem to have become more sensitive to and aware of misconduct, and consequently, have become more proactive about it (see furthermore Ioannidis 2012; Fanelli 2018).

This begs the question, even if editors have become more aware of the issue, can we trust that science will be able to rectify (all of) its mistakes this way? Stroebe et al., reviewing a number of recent examples of misconduct, are not overly optimistic. Science is based on trust, they argue, and as such ‘scientists do not expect their colleagues to falsify their data, and do not look for signs of fraud’ (2012, p. 680). What would really help, Stroebe et al. argue, is to fortify the position of the whistleblowers, who, after all, have been responsible for detecting the majority of falsities in the first place.

Furthering this line of thinking, consider Post-publication Peer Review(PPPR). PPPR is a commenting system that allows publications to be reviewed and discussed online, on platforms such as PubPeer and Open Review after they have been published – on a (mostly) permanent basis.

Appraising this approach, Jaime Teixeira da Silva (2015, p. 37) considers that the advantage of PPPR is that it ‘makes authors, editors, peers, journals and publishers accountable for what they have published or approved of publishing in the framework of their publishing models.’

However, the question is whether PPPR should consist of anonymous reviews (comparable with traditional peer review) or not. Teixeira da Silva is a vocal opponent of anonymity in peer review and a severe critic of PubPeer, which publishes anonymous reviews and allows unchecked accusations with little or no accountability.

Evidently, PPPR invites questions about the quality of those peers, but it also points to a new direction science is taking. In the twenty-first century, research in the social sciences is no longer considered an isolated effort of one individual (or a small group of individuals), but rather that of whole networks. Using the strength of collectives (networks) while simultaneously answering the increasing call for greater transparency, we find a growing inclination among social scientists to use open repositories to deposit and share data, pre-registration of protocols, and the commissioning of experts to monitor and review research. Thus, in the social sciences (modeling the medical sciences), ethical review boards have attained a progressively more important function in research.

While many of these initiatives further the social sciences in becoming more open and more accountable, aiding it in diminishing publication bias and forms of sloppy science, it does little to overcome confirmation bias, which still looms over the field, mainly because scientists will still only publish ‘significant’ results. In an attempt to address this problem, Ioannidis (2012) and van Assen et al. (2014), among other advocates, propose that journals should no longer focus on novel findings. Let them instead publish everything, including null results. They argue this change will make the scientific record complete, rather than fragmented.

7 Conclusions

7.1 Summary

In this chapter, we’ve followed the empirical cycle from beginning to end, exploring the various ways bias may disturb or corrupt our findings. We found that research does not always reveal what was intended or desired, leading to the danger of misrepresentation, one-sidedness, or even the production of downright falsehoods.

We showcased how the questions we ask may be biased towards the confirmation of what we already know. Confirmation bias (AKA myside bias) effectively obstructs creativity and progress in science and impedes more objective or at least impartial explorations from taking place.

With a strong incentive to publish research that show significant results, the danger of questionable research practices (QRPs) was introduced. Datamassaging,p-hacking,HARKing, and other tricks meant to lower the p-value and thus ‘heighten’ the validity of research outcomes have the potential of polluting research findings on a large-scale, and endangers science’s credibility.

The file-drawer problem and false positives point to the danger of bias during the dissemination process and reside under publicationethics. The tendency to report only what is significant, and to avoid reporting null findings creates a distorted view of reality, further enhanced by editorial and review bias, and the dangers of self-deception.

Increased retractions of ‘contaminated’ (fraudulent) papers show that science is able to self-correct, but the question is whether this is enough. Some argue that the system is strong enough to correct itself in the long run, whereas others believe more drastic measures are called for, including post-publication peer review (PPPR),pre-registration, and new journal policies to publish everything instead of only ‘interesting’, ‘novel’, and ‘significant’ findings.

7.2 Discussion

Research falsifying clearly poses a threat to science’s claims of objectivity, verifiability, and other core values of science (see Chap. 2). Part of the problem may be attributed to overly ambitious researchers not taking the standards seriously enough, but part of it cannot be attributed to willful misconduct. Confirmation bias may be the result of something that remains entirely unconscious, and the file-drawer problem may be more likely the result of a fault in the system than the fault of an individual researcher. Similarly, editorial bias seems ingrained in the larger dissemination process, and certainly requires further attention. What suggestions do you have for addressing these ever-present issues of falsifying?