Keywords

1 Overview

Plagiarism is far older than the internet. Its roots can be traced to ancient Roman practices and to the onset of modern sciences in the Enlightenment era. One of the most common interpretations is tied to individual authorship and the need to protect original contributions to society or research (see Sutherland-Smith, 2015) and to ensure the economic consequences of original text production for its “owners.” From this perspective, plagiarism is considered a kind of intellectual theft (the word “plagiarism” comes from Latin plagiarius, or “kidnapper”)—an offence against the legal protection of proprietary rights. Although plagiarism is not a criminal offense, it often leads to civil litigation because of copyright violation, or to personnel actions because of ethical standards.

With the establishment of the Web in the 1990s and its introduction into homes, schools, and universities, the threat of plagiarism took on new urgency. The immediate culprit was the new opportunity for writers to copy and paste other authors’ work from the Internet into their own texts. This opportunity increased as the availability of texts from online sources increased exponentially and new, powerful search engines such as Google made those texts readily accessible. In addition, paper mills (Bartlett, 2009) as a form of contract cheating (Lancaster & Clarke, 2015) also increased as internet platforms offered the risk-free transfer of texts for money. To cope with these new digital circumstances, universities developed integrity divisions and codes of ethical conduct for students (Anson, 2008). Also, a new interest in plagiarism theory appeared, revealing nuances of student source use such as “patchwriting” (Howard, 1999), spawning studies of student research and referencing practices (see Jamieson & Howard, 2011; Citation Project) and distinguishing between the uninformed misuse of sources by students and the deliberate appropriation of other writers’ text without attribution (WPA Council, 2019).

Theories of plagiarism also explore its meaning and range of application. As Weber-Wulff (2014) points out, there is no valid definition of plagiarism. In part, the lack of certainty about plagiarism comes from varying practices and beliefs in different discourse communities about the processes of acknowledging others’ work (see Anson, 2011, and Anson & Neely, 2010 for specific cases; see also Maxwell, et al., 2008). To complicate matters, plagiarism applies not only to text but also to data, source code, pictures, tables, and patents, all of which need different kinds of tracking and detection technology. Weber-Wulff (2014) offers an even wider list of plagiarism activities, including translation plagiarism, plagiarism of structures, self-plagiarism, patchwork referencing, and others. In addition, plagiarism is often conflated with other forms of textual deception such as “contract cheating” (when someone produces the writing for the person claiming authorship, which plagiarism software is usually unable to detect—see Curtis & Clare, 2017; Lancaster & Clarke, 2015; for data on contract cheating, see Newton, 2018).

In educational contexts, student plagiarism usually does not violate property rights but violates the rules of disclosing the origin of ideas and text. In academic fields, most published text may be used and at least partly reproduced, provided it is properly cited or, in some cases, that the original author is compensated for the rights of reproduction (Hyland, 1999). In classroom contexts, concerns are less focused on copyright violations than on ensuring that the work students submit is their own. The reasons include the purposes of their learning, the need to evaluate the quality of the texts they write, and the importance of teaching them proper academic citation processes for future work. For these reasons, most educational institutions view student plagiarism as a violation of a contract-like agreement that the work is original and that all others’ text is properly cited. Violations are not treated in legal terms but as a breach of an honor code, with punishments (if caught) of failing a specific paper or the entire course, being put on academic probation, receiving a “scarlet letter” on one’s graduation transcript (or the metaphorical equivalent; see Swagerman, 2008), or being expelled from the institution.

Plagiarism is not a marginal issue; substantial numbers of students are willing to cheat with their assignments, as shown in several large-scale questionnaire and survey studies of academic integrity (McCabe, 2005; McCabe et al., 2001). A survey carried out between 2003 and 2005, with 63,700 responses from undergraduate students and 9,250 from graduate students, showed the following percentages of students who have engaged in the respective behavior at least once in the past year (McCabe, 2005; percentages are listed for undergraduates first and graduates second):

Behavior

UG’s (%)

Grads (%)

Working with others on an assignment when asked for individual work

42

26

Paraphrasing/copying a few sentences from written source without footnoting

38

25

Paraphrasing/copying a few sentences from Internet source without footnoting

36

24

Receiving unpermitted help from someone on an assignment

24

13

Fabricating/falsifying a bibliography

14

7

Turning in work copied from another

8

4

Copying material almost word for word from a written source without citation

7

4

Turning in work done by another

7

3

Obtaining a paper from a term paper mill

3

2

These data, however, are not longitudinal. Even though internet use has increased exponentially, it is not clear whether it has caused an increase in plagiarism, as Harris et al. (2020) showed in a large sample of adult learners in an online teaching context. The McCabe study even showed a decrease in copying from internet sources compared to print material (see also Walker, 2010.)

Other research on cheating shows that a relatively small group of students tend to engage in serious types of plagiarism (in contrast to the unknowing misuse of sources because of lack of training), but most students today are or have been affected by the practice of plagiarism detection introduced since the early 2000s. In the teaching of writing, plagiarism detection has an additional consequence which is alternatively called plagiarism anxiety, plagiarism phobia, or plagiarism paranoia. All three refer to the fear of being punished for incidentally and unknowingly plagiarizing. The reasons are twofold: first, when rules for referencing are not clear, and second, when instructional discourse moves plagiarism into the domain of misconduct and academic punishment. For the teaching of writing and referencing, it is essential to give students the opportunity to make mistakes. A differentiation between errors and misconduct is necessary, and referencing skills should not be learned in a climate of punishment and pseudo-criminal charges, as the use of plagiarism detectors often implies, but rather in a context of critical thinking (Vardi, 2012). When plagiarism software is mistakenly assumed to unerringly detect plagiarism, as Silvey et al. (2016) claim is the problematic case at Australian universities, the learning of intertextuality is prevented rather than fostered. In addition, students need to be acquainted with the nature of plagiarism detection so that if and when they are in a context that uses detection programs, they are well informed about how these programs work.

2 Core Idea of the Technology

Since roughly 2000, a constant stream of new tools and technologies has emerged to identify plagiarism in students’ and professionals’ documents. Plagiarism detection software became a matter of public interest and a great concern in higher education policy even though the real numbers, as the data above show, never reached the imagined dimensions of internet plagiarism. Reduced to its core operations, the technology indicates the similarity of a given text to already published texts or texts held in the system’s database. The critical requisites of this software are (a) the access it has to a database of published texts and the size of this database, and (b) the algorithm that calculates the similarity.

However, existing tools cannot unerringly identify plagiarism; the software can only indicate cases of possible plagiarism through text matching, but cannot identify plagiarism itself. It cannot, for example, differentiate between well-referenced similarities and plagiarized ones. They all are included in the index of similarity. These facts have called into question the use of the terms “plagiarism detection software” or “plagiarism checkers.” Foltýnek et al. (2020) suggest the alternative terms “text-matching software” or “software supporting plagiarism detection,” while Wikipedia prefers “content similarity detection.” Weber-Wulff (2019) calls the software “a crutch and a problem,” and does not see it as a solution for the plagiarism problem. From her experience of annually testing several publicly available tools, she writes that

The results are often hard to interpret, difficult to navigate, and sometimes just wrong. Many systems report false positives for common phrases, long names of institutions or even reference information. Software also produces false negatives. A system might fail to find plagiarism if the source of the plagiarized text has not been digitized, contains spelling errors or is otherwise not available to the software system. Many cases of plagiarism slip through undetected when material is translated or taken from multiple sources. Assessments depend on both the algorithms used and on the corpus of work available for comparison. On the other hand, they can do more than detect plagiarism as they are able to indicate all parts of a text that matches sources texts on the internet. This may also be used to learn, control, discuss, or study referencing.

Weber-Wulff further discusses the intention of the devices, the processes of detection, and the ways the systems have been used. She concludes that “Academic integrity is a social problem; due diligence cannot be left to unknown algorithms.” Still, the comparisons show that the quality of the tools differs markedly; her conclusion is not to abandon the technology but use it differently.

While one area of plagiarism research and development still aims to improve plagiarism detection and invest pseudo-criminological interest in detecting more subtle kinds of plagiarism and obfuscation, many practitioners in this field move in another direction, using the software as a tool for learning about the practice of drawing on the work of others and appropriately acknowledging the source of that work.

Grammarly, for example, originally designed as an editing tool, also offers a plagiarism checker for writers with a much gentler assumption about the reasons of copying from other papers than the usual plagiarism definitions suggest:

You’re working on a paper and you’ve just written a line that seems kind of familiar. Did you read it somewhere while you were researching the topic? If you did, does that count as plagiarism? Now that you’re looking at it, there are a couple of other lines that you know you borrowed from somewhere. You didn’t bother with a citation at the time because you weren’t planning to keep them. But now they’re an important part of your paper. Is it still plagiarism if you’re using less than a paragraph? (Grammarly).

Here, Grammarly points to inattentiveness or unintended errors as causes of plagiarism rather than as collusion or cheating, or intentional copying. Its intent is to offer its services to prevent plagiarism.

Other plagiarism detection tools are aimed at professional communities, particularly academics. iThenticate, for example, is a platform used by many journal editors and researchers to detect plagiarism and text replicated across articles by the same author(s) (see www.textrecycling.org). The database is populated by 93% of top-cited journal content and over 70 billion current and archived web pages. The tool is used both formatively by researchers (to ensure they have made no errors of citation or attribution) and as a tool to detect plagiarism or text recycling.

3 Functional Specifications

Plagiarism software contains several functionalities that interact to analyse text input:

  • a field to insert text;

  • a function to pre-process text that typically includes document format conversions and information extraction (FoltĂ˝nek et al., 2019);

  • a corpus of texts used as a reference field for the text in question or access to a search engine (often including but not limited to Google);

  • an algorithm comparing the indicated text with the ones from the corpus or the internet;

  • a control panel indicating text similarity (alternatively, text originality) as a percentage or the number of matches with existing texts;

  • a way of marking all text that is identical to any of the originals in the corpus, including references to the source and indicating the original text.

Plagiarism software may also contain features to detect obfuscations such as altering copied texts or filling in letters made invisible (by using white color) into the spaces between words. Plagiarism software such as Turnitin does not indicate “whether plagiarism has occurred as it does not identify whether a student has appropriately referenced, quoted, and/or paraphrased” (Silvey et al., 2016).

Algorithms for intertextuality software may work on different principles that may be combined but usually are not disclosed to their users. For a further explanation of how plagiarism detection software works, see Bailey (2016) and Eisa et al. (2015).

4 Main Products

The prototype for plagiarism software is Turnitin, simply because it has been the most successful at selling its products to institutions and is used in over 100 countries. Originally developed by iParadigms, an educational technology company founded by researchers at the University of California at Berkley, it was then sold to investors in 2014. Silvey et al. (2016) note that Turnitin is used by 90% of Australian universities in one or form or another, and Barrie (2008) claims that 95% of UK institutions use Turnitin. In the US, where plagiarism detection tools are controversial and have met with significant resistance among many writing-studies specialists, the number may be smaller. iParadigms also created an informational web site for plagiarism, www.plagiarism.org, which is sponsored by Turnitin and addresses students as well as faculty.

Turnitin.com has changed its web emphasis from plagiarism detection to support for student creativity and for upholding academic integrity. As of this writing, its services are currently split into five areas:

  • Originality: This tool is a plagiarism detector indicating similarities of papers with web-based texts; it includes the teaching of referencing and may be offered to students for self-checks of plagiarism.

  • Gradescope: This tool offers grading services in collaboration with teachers who indicate criteria for evaluation.

  • iThenticate: As mentioned, this tool compares content against existing literature but focuses on published work and is therefore often used by academics and professionals. It supports the development of focus, the detection of similarities to other papers, manuscript development, and collaboration.

  • Similarity: This tool is a pure plagiarism checker that shows similarities to existing papers, displays the original literature, and is sensitive to manipulations and attempts to hide plagiarism.

  • Revision assistant: This tool offers feedback to students about intertextuality but also about various other issues (see Mayfield & Adamson, 2016).

Turnitin compares submissions with all internet material available and with all student papers ever submitted to Turnitin (so that students cannot “reuse” material from their peers’ previously submitted papers). It cannot access internet materials stored behind paywalls and print-only materials but in some versions, it seems to have access to books issued by a large number of publishers. When it started, Turnitin relied mainly on a corpus of all submitted student papers; however, forcing students to submit their work for permanent “ownership” by a for-profit corporation met with considerable concern among some educators. Today, it maintains web crawlers to access all relevant internet materials.

The exact number of currently existing plagiarism detectors is unknown; many are somewhat more primitive versions of Turnitin or Grammarly. There are many local developments in various languages which are hard to access. Based on research into their effectiveness, Plagiat Portal classified 26 plagiarism detection tools into three categories: “partially useful systems” (Plagaware, Turnitin, etc.); “barely useful systems for education” (Plagiarism Finder, Docoloc, etc.), and “useless systems for education” (iPlagiarismCheck, Catch It First, etc.). A number of learning management systems, such as Moodle, allow for the addition of plagiarism detection tools into their platforms for easy access.

5 Research

It is beyond the scope of this chapter to refer to all the abundant research on plagiarism detection (see Bretag, 2016, for international perspectives). Foltýnek et al. (2020) offer an extended review of plagiarism literature that differentiates three levels:

  • Plagiarism detection methods refer to the automated identification of intertextual elements by varying algorithms.

  • Plagiarism detection systems refer to tools ready for use, including commercial offers such as Turnitin.

  • Plagiarism policies refer to research on “the prevention, detection, prosecution, and punishment of plagiarism at educational institutions” or to publications analysing the occurrence or forms of plagiarism and the institutional reactions to it.

For an understanding of plagiarism software, comparative research is essential. Comparisons can be done for different tools, for different types of plagiarism, and for uses in different languages. As developmental processes vary and some tools are continuously updated while others disappear and a third kind is newly launched, such comparisons are continuously necessary but their results don’t last long. They help develop the field and the tools more than they produce cumulative results.

The most thorough comparison of available software has been carried out by a group of nine members of the European Network for Academic Integrity (Foltýnek et al., 2020) in which 15 text-matching systems were compared. A large number of languages from the Germanic, Romanic, and Slavic language families were included and a differentiated set of texts with varying kinds of plagiarisms (including obfuscation, translation, and paraphrasing) was used.

Many studies of plagiarism detection have focused on their pedagogical implications (Anson, 2011), the way they define plagiarism or students committing it (Canzonetta & Kannan, 2016), or the sources of resistance toward detection tools (Vie, 2013). Studies of student and faculty attitudes toward plagiarism detection software show mixed results; Atkinson and Yeoh (2008), for example, found some positive attitudes by both instructors and students toward the software, but just as many concerns, including (for students) worrying that too much emphasis could be placed on detection and not the quality of their writing, and (for instructors) the extra work involved in the process of detection and the process of pursuing academic misconduct—results found similarly by Savage (2004). Dahl (2007) found that postgraduate students looked upon Turnitin mostly favorably, but a few were less certain perhaps because of their concerns about their ability to cite sources correctly. In a study of instructors’ attitudes toward plagiarism and Turnitin, Bruton and Childers (2016) found varying attitudes toward the software, as well as contradictions between instructors’ sense that much plagiarism is a forgivable lack of skill and the strict policies on their syllabi.

It is not clear whether knowing that their papers will be submitted to a plagiarism detection system will deter students from plagiarizing. In one study (Youmans, 2011), half the students in two sections of a psychology course were informed that their papers would be submitted to Turnitin.com and half were not. However, the forewarned students did not plagiarize to a lesser extent than those who were not informed. To test the possibility that students did not know the effectiveness of Turnitin or how it works, a follow-up study reported in the same article controlled for this knowledge. However, students who were informed about Turnitin’s mechanisms did not plagiarize to a lesser extent than those who were not informed. The author speculated that the challenges of source use may have overridden students’ abilities to avoid unintentionally borrowing material they consulted.

Research on plagiarism detection software used instructionally rather than punitively has shown generally positive results. A comparative study of students receiving conventional anti-plagiarism instruction and others using the software as a learning tool resulted in significant reductions in plagiarism among the latter group (Stappenbelt & Rowles, 2009). Halgamuge (2017) found that formative uses of plagiarism detection software yielded “a substantial benefit in using Turnitin as an educational writing tool rather than a punitive tool.” Rolfe (2011) found that both instructors and students had positive impressions after using plagiarism detection software formatively. And Davis and Carroll (2009) found that when used together with tutorial-like questions, Turnitin originality reports “appeared to have a positive effect on students’ understanding of academic integrity reflected in improved drafts.”

Analyses of the accuracy of plagiarism detection tools have revealed their limitations; Plagiats Portal (cited above) found that, using rigorous standards, the “best” systems were no more than 60–70% accurate. Perhaps the most extensive research on the accuracy of plagiarism detection tools is a series of studies by Weber-Wulff conducted between 2004 and 2013 and summarized in Weber-Wulff (2015), who concludes that although some systems “can identify some text parallels that could constitute plagiarism … the reports are often not easy to interpret correctly, software can flag correctly referenced material as non-original content, and there are cases in which systems report no problems at all for heavily plagiarized texts” (p. 625). A study by Purdy (2003) confirmed these findings. Mosgovoy et al. (2010) analyze the most promising detection systems and offer a roadmap for further developments.

6 Implications

It is not known fully what effect plagiarism detection tools have on novice or experts’ composing processes. Typically, the software operates either on whole texts in draft form, which are submitted so that any questionable material can be appropriate revised or so that unattributed material can be appropriately cited; or on finished (submitted) text as a way to detect plagiarism and remediate or punish the writer. However, as mentioned, students’ awareness that their writing may be submitted for plagiarism detection could create anxiety or lead to “safe” writing that does not rise to standards of complexity required of academic writers.

One possible application of plagiarism detection tools would require students to study the results of their paper’s submission and then analyze any false positive or false negative matches and write a parallel paper or reflection explaining what should or should not be changed or what should be retained because of limitations in the software.

It is also not clear whether plagiarism detection tools result in stronger writing quality, since they focus only on text attribution—unless this is included as a feature in primary trait scoring of students’ writing (see Howard, 2007). However, if instructors respond to students’ drafts in progress after submitting them to a plagiarism detection system, and then offer advice based on the results, we might predict that the quality of writing will improve.

Further implications include ethical concerns that commercial interests such as Turnitin.com acquire some level of “ownership” of the work students are forced to submit as a course requirement. In addition, teacher-student relationships can be affected when students are suspected of possible plagiarism (by having their work screened) before they have done anything wrong.

7 List of Tools

Only current products previously rated as “partially useful” by Plagiat Portal are included:

Software

Description

URL

Turnitin

Plagiarism detection; proprietary; Web-based; can be incorporated into LMSs; text matching; includes other products such as assessment and feedback support

Turnitin.com

Plagaware

Plagiarism detection; freemium; Web-based; text matching; texts must be uploaded individually

http://plagaware.com

Plagscan

Plagiarism detection; proprietary; Web-based; text matching; three types of reports; includes source links

http://www.plagscan.com

Urkund

Plagiarism detection; freemium; Web-based; can be incorporated into some LMSs; text matching; “detects ghostwriting”; includes writing style analysis

http://www.urkund.com