Advertisement

Paraphrasing tools, language translation tools and plagiarism: an exploratory study

  • Felicity M. Prentice
  • Clare E. Kinden
Open Access
Original article
Part of the following topical collections:
  1. Machine-based plagiarism: The death of originality in the digital age?

Abstract

In a recent unit of study in an undergraduate Health Sciences pathway course, we identified a set of essays which exhibited similarity of content but demonstrated the use of bizarre and unidiomatic language. One of the distinct features of the essays was the inclusion of unusual synonyms in place of expected standard medical terminology.

We suspected the use of online paraphrasing tools, but were also interested in investigating the possibility of the use of online language translation tools. In order to test the outputs of these tools, we used as a seed document a corpus of text which had been provided to the students as prompt for the essay. This document was put through six free online paraphrasing tools and six separate iterative language translations through the online Google Translate™ tool.

The results demonstrated that free online paraphrasing tools did not identify medical terminology as standardised or accepted nomenclature and substituted synonyms, whereas Google Translate™ largely preserved medical terminology.

We believe that textual indicators such as the absence of standard discipline-based terminology may be of assistance in the identification of machine paraphrased text.

Keywords

Paraphrasing tools Patchwriting Plagiarism Online language translation Medical terminology 

Abbreviations

CAT Scan

Computerised Axial Tomography Scan

EAL

English as an Additional Language

ED

Emergency Department

LOTE

Language other than English

Introduction

Imagine you are reading a student’s essay and are confronted with the following sentence:

A situation that can give resistance and additionally generally safe for botches, and that inspects choices without assaulting the pride and nobility of the individual influencing them, to will prompt better natural decisions.

In an assessment task set for first year undergraduate Health Science students in a pathway program, an alarming proportion of submitted work, nearly 10%, demonstrated linguistic contortions similar to the example given. This led us to consider the following questions:
  1. 1.

    Were students using online paraphrasing tools to manipulate work which was written in English and which had not been authored by them?

     
  2. 2.

    Were students who had English as an Additional Language (EAL) composing work in their first language and then translating this through online language translation tools?

     
  3. 3.

    Are there indicators which can identify the use of on-line paraphrasing tools?

     

All examples of unusual writing provided in this article are indicative of the nature of the student writing encountered but have been altered to retain anonymity while preserving the features of the linguistic anomalies.

While standards of English expression may vary considerably in work submitted by students, it is becoming more common to encounter essays which display standards of writing well below that which is expected of students studying in Higher Education. When the student is from an English as an Additional Language (EAL) background, poor expression in written work has been attributed to lack of facility with the language, clumsy patchwriting, or the use of an online translation tool, such as Google Translate™ (n.d.) (https://translate.google.com.au). Mundt and Groves (2016) contend that when students use an online translation tool to convert their own work from their first language into English this may be considered demonstrative of poor academic practice, as they are not actively developing English language skills. However, as the original work is the result of the student’s own intellectual merit, it is contentious as to whether this qualifies as academic misconduct. In the case of the submissions we received there was reasonable suspicion that the text had not been subject to a language translation tool but had been reengineered by an English-to-English paraphrasing tool. This called into question the source of the original English text, and suggested there was evidence of a genuine breach of academic integrity.

Rogerson and McCarthy (2017) reported that their initial awareness of paraphrasing tools was through a casual comment by a student. In our case, the serendipitous discovery of online paraphrasing tools was made when one of the authors was following an online forum discussing cheating methods. Prior to this revelation, our assumptions as to the origin of incomprehensible student writing had been more naïve, our explanations being focussed around patchwriting and LOTE-to-English translation tools. However, when encountering the extent of the use of inappropriate synonyms in essays submitted for this particular assessment task, we were moved to examine the text more closely. A review of one or two essays rapidly escalated to the identification of a cluster of essays which bore remarkable similarity in the use of peculiar language, and in particular the inclusion of bizarre synonyms for standard recognised terminology within the health sciences discipline. Further to this, there was significant similarity in the structure of the essays, where the information, and even in-text citations, were provided in an identical sequence. In some cases, the Turnitin® (n.d.) similarity index identified a match between a number of essays, but other suspicious works resulted in an index of 0%. It became clear that paraphrasing tools were probably being used and that students were colluding to paraphrase each other’s essays.

The literature is replete with the lamentations of academics who feel that pursuing academic misconduct forces them in to the role of detective. Collecting evidence, analysing scenarios, motives and prior offences and operating in a quasi-judicial, if not criminological paradigm, does not sit well within the cultural norms of academia (Brimble and Stevenson-Clarke 2006; Burke and Sanney 2018; Coren 2011; Keith-Spiegel et al. 1998; Sutherland-Smith 2005: Thomas and De Bruin 2012). Our experiences seemed to resonate so clearly with this sentiment to the point where we felt a profound urge to recreate a television crime show, with essays taped to the wall connected by string, surrounded by tacked-up maps and photographs of the suspects.

The breakthrough came when an essay was so alarmingly absurd that we were able to trace the origin to another student’s essay. The assessment task was to analyse and discuss a scenario regarding a young Indigenous man’s experiences in the Australian Health Care System.

One student included in their essay a description of a Computerised Axial Tomography (CAT) scan which had been plagiarised from a Wikipedia page. However, in transcribing how images were taken from various angles, they had misspelled the word ‘angles’ as ‘angels’. This spelling error had not caused concern, however work submitted by another student provided evidence that there was a curious literary connection between the essays. In this case the second student reported that the CAT Scan images were taken from various ‘Blessed Messengers’.

It was apparent that the second student had used a paraphrasing tool to ‘spin’, that is, to apply synonym substitution, to the essay obtained from their colleague.

Given the poor standard of the output, why would a student resort to using paraphrasing tools? Paraphrasing is a complex and demanding task, requiring students to demonstrate not only understanding of the meaning and purpose of the text, but also to find the linguistic facility to restate this meaning in new and original words, and specifically in the discourse of Academic English (Shi 2006).This task is difficult enough when performed in a first language, and the challenge is magnified when the student is from a non-English speaking background (Bretag 2007; Carroll 2015; Correa 2011; Handa and Power 2005; Marshall and Garry 2006).

Bretag (2007) describes two aspects of the acquisition of a second language. Basic interpersonal communication skills can be developed in approximately two years, however it is estimated to take five to ten years to develop cognitive academic linguistic proficiency which is necessary to function in an academic learning environment. Patchwriting is when students attempt to paraphrase a source by substituting synonyms in passages while retaining too closely the voice of the original writer (Jamieson 2015). This may be classified as an intermediary stage of the development of academic linguistic proficiency representing a form of non-prototypical plagiarism (Pecorari 2003). As such, it may not be a deliberate or intentional breach of academic conduct. In students with EAL, the acquisition of the linguistic facility to represent the meaning of a text without resorting to reproducing the author’s actual words may take more than the few months that our students have been studying at an English-speaking University. However, in the cases under consideration, students did not attempt to manually re-engineer text in order to paraphrase but used an online paraphrasing tool to alter the entire corpus of the text. The original source text could be identified in many cases by a recognition of some structural features, for example, the reproduction of the scenario provided to the students.

Original

One day, while Doug was out walking, he felt lightheaded and then lost consciousness and fell to the ground. He was brought to the Emergency Department of a major hospital by ambulance for assessment and investigation.

Post paraphrasing tool

While one day on his walk Doug he felt bleary eyed and lost awareness and fell onto the ground. He was conveyed to the Emergency Department of the healing facility for significant appraisals and tests.

In some cases the original source was taken from the internet, notably Wikipedia, but in one instance the student lifted and paraphrased text taken directly from a file sharing site. The student did not provide an in-text citation, however the original source was identified by the student including the file sharing website address in the reference list. This has been referred to as illicit paraphrasing (Curtis and Vardanega 2016), and actions such as this may call into question the level of intentionality to deceive. The inclusion of a reference, albeit from an inappropriate source, may suggest the student was attempting to participate in the expectations of academic practice. Less generously, it may be assumed that copying material directly from a file sharing site, using a paraphrasing tool to deceive Turnitin® (n.d.), and then submitting the work, even with a hopeful inclusion in the reference list, demonstrated an intentional breach of academic integrity.

Patchwriting

Strategic word substitution has always been a feature of students’ attempts at paraphrasing, which Howard defined as patchwriting,

Copying from a source text and then deleting some words, altering grammatical structures, or plugging in one synonym for another.

(Howard 1999, p.xvii, in Jamieson 2015)

While patchwriting by students has been characterised as poor academic practice, it is also seen as a preliminary effort to become familiar with the discourse of academic writing (Pecorari 2003).

In the essays considered in this exploratory study, we encountered examples of English expression which indicated that the EAL student was struggling to develop fluency, for example:

Doug leaves his home and move far away from his family to the city. There he have house with an unknown people and he have feeling of loneliness and unhappy. He is not able to get the job and had very small income. He was usually sad and feel bad in himself. It is all these factors lead to a poor health.

We were also able to recognise patchwriting in text that had been appropriated from multiple sources, and these incidents were usually identified by Turnitin® (n.d.) and exemplified by a ‘rainbow’ of colours in the similarity report demonstrating different sources. However, in the essays under investigation the text demonstrated the inclusion of synonyms resulting in writing which was largely unintelligible. Further to this, there had been no manipulation of the syntax of the sentences, which heightened the unidiomatic nature of the writing. Whereas in patchwriting synonyms are manually substituted by the student, online paraphrasing tools achieve this through an automatic function, and thus the question arises, as posited by Rogerson and McCarthy (2017), as to whether the use of online paraphrasing tools transcends patchwriting to become what Walker describes as illicit paraphrasing (in Pecorari 2003, p.9).

Expected medical terminology

One of the most obvious issues we encountered in the essays was the use of synonyms for standard medical terminology. Standardised nomenclature and terminology are employed throughout health care to avoid ambiguity in documentation and communication. This provides the interface for meaningful and appropriate communication of medical, nursing and allied health information regarding patient care, and is an essential element of safety and standardisation in care (Pearson and Aromataris 2009). In addition, this terminology is used for medical information classification, and has been raised as a priority area in the introduction of electronic health records to ensure interoperability across systems and health disciplines (Monsen et al. 2010). The importance of employing correct and predictable terminology has been identified as paramount in avoiding adverse outcomes:

Current research indicates that ineffective communication among health care professionals is one of the leading causes of medical errors and patient harm.

(Dingley et al. 2008, p.1)

Therefore, the acquisition and correct contextual application of medical terminology is a fundamental part of learning in health sciences. Students are exposed to this terminology throughout their studies, and in the case of the assessment task under scrutiny, students were provided a scenario, or enquiry prompt, which included the standard discipline-based terminology (see Appendix). The lack of standard medical terminology and the inclusion of unusual synonyms for this terminology was a significant feature of the essays. In the event that students were exhibiting difficulties with English expression, or were manually substituting synonyms as seen in patchwriting, it would be expected that the standard terminology would be preserved. This led us to suspect, and subsequently investigate, online paraphrasing tools.

Paraphrasing tools

Spinning is a technique used to produce a new document, or documents, from an original text source by replacing words in such a way as to retain the overall meaning of the text, while avoiding machine-based text matching tools used to identify plagiarism. Machine based paraphrasing tools were developed to enable text spinning as a way of improving website rankings in Google search results and are part of a suite of search engine optimisation (SEO) techniques referred to as Black-Hat marketing. (Lancaster and Clarke 2009; Rogerson and McCarthy 2017; Zhang et al. 2014).

In web-based marketing the goal is to get the highest ranked place in a Google search index.

The Google search engine identifies and calculates the frequency of links between, and website traffic to, each website and ranks sites on the search results accordingly. In Black Hat marketing, the aim is to create sites including blogs, articles and webpages which provide multiple links to the target page, thus ensuring optimisation of the search engine results and a higher overall ranking (Bailey 2018).

Google search engines use word matching software which can recognise duplicate text and penalties are applied where this has been detected, hence the need to create paraphrasing tools which will instantly produce duplicate text material which cannot be detected. These paraphrasing tools were designed to hoodwink word matching software but were not intended to emulate human generated text. It is apparent that students are now using these tools to spin text from numerous original sources with the aim to deceive word matching software such as Turnitin® (n.d.).

The free online automated paraphrasing tools rely principally on synonym substitution without altering the overall syntax of the sentence, resulting in language which is unidiomatic at best, incomprehensible at worst.

When Rogerson & McCarthy published in 2017, they reported that a simple Google search for paraphrasing tools resulted in over 550,000 hits. Our search in 2018 demonstrated a proliferation of paraphrasing sites resulting in over 3,320,000 hits. Cursory examination revealed that many are duplicate sites with the same tool offered under different names. Of greater concern is the increased juxtaposition of advertisements and links to essay purchasing services. Anticipating the vulnerability of the student, some sites offer a free paraphrasing tool but ensure the output is extremely poor.

For example, when the following sentence taken from the assessment scenario:

One day, while Doug was out walking, he felt lightheaded and then lost consciousness and fell to the ground. He was brought to the Emergency Department of a major hospital by ambulance for assessment and investigation.

is entered into a free online paraphrasing tool the following results were obtained:

Brace girl, stretch Doug was at large peripatetic, he felt lightheaded and fit lost consciousness and fell to the ground. He was debasement to the Danger Diversify of a chief sanatorium by ambulance for weight and criticism.

Plagiarisma http://plagiarisma.net/spinner.php

One sidereal day, while Doug was out walk, he felt lightheaded and then lost knowingness and downslope to the pulverization. He was brought to the Emergency Department of a major hospital by ambulance for assessment and probe.

Rephraser https://www.rephraser.net/instant-paraphrasing-tool/

This word salad is used to entice students into contract cheating, that is, outsourcing the assessment task to be completed by a third party (Lancaster and Clarke 2006). The sites provide a link to an essay writing service, in one case with a curiously poorly worded advertisement stating:

Aren’t satisfied with the results? But what to expect from the tool? Hire an expert for a quality rewording! Only $8.39/page.

Paraphrasing Online https://www.paraphrasingonline.com

Paraphrasing tools work by creating an intermediate text referred to as “spintax”, where a number of synonyms are provided for each selected word, for example the phrase:

the junior doctor in the rehabilitation centre prepared a discharge summary

is transformed into the intermediary spintax:

the {understudy specialist | lesser specialist | lesser pro} in the {recovery fixate | recovery focus | rebuilding centre} prepared a {release rundown | release report | blueprint}.

Based on a number of parameters, words can be substituted at varying rates within a sentence, however it is non-deterministic. Therefore, for the purpose of Black Hat marketing, this provides a vast number of permutations for the creation of articles which are sufficiently different from each other to evade detection by word matching software (Bailey 2018). This explains why students using paraphrasing tools may generate apparently different essays from a single seed document.

To create the spintax, a bank of potentially alternative terms is held in a synonym dictionary, which may be local to the paraphrasing tool, or held in cloud storage (Shahid et al. 2017; Zhang et al. 2014). In their study, Zhang et al. (2014) were able to access this dictionary and reverse engineer two paraphrasing tools (Plagiarisma and The Best Spinner) to establish which words are subject to synonym substitution, referred to as ‘mutables’, and which words do not appear in the synonym dictionary and thus would not be included in the spintax, referred to as ‘immutables’. This approach, referred to as DSpin, relies on comparing the unchanged text, or immutables, located within the spun text to the original text (Zhang et al. 2014). The match of immutable terms between documents (spun and original) will provide evidence of the source of the text. We became interested in the concept of immutable words and how these may be used to identify documents that had been machine paraphrased.

The paraphrasing tools that require a fee-based subscription provide a large number of parameters to manipulate the output, including the contents of the dictionary, the maximum number of synonyms used and replacement frequency, and the replacement of both single words and short phrases (Shahid et al. 2017). In this study we assumed that the students were accessing the fee free version of online paraphrasing tools and as a result the output of spinning was less subject to control resulting in more words treated as mutables and thus less discretionary synonym substitution.

As medical terminology is fundamental to the discourse of health sciences, it would be reasonable to classify these words as preferentially immutable. However, the paraphrasing tools do not have the capacity to recognise the significance and importance of these terms, and thus they are within the synonym dictionary as mutables and subject to synonym substitution.

Students in this unit of study are exposed to medical terminology throughout the curriculum, and it is emphasised that these terms are fundamental to the discourse and required for communication in health sciences. Hyland (2006) notes that becoming a member of a discourse community involves “learning to use language in disciplinary approved ways” (p.38). They are expected to use these terms, and it is clear in the rubric and marking guides that the assessment is aligned to the objective of the acquisition of this specialised language. The scenario provided in this assessment was rich and replete with the terminology, and there was ample opportunity for imitation and reproduction of the writing style and nomenclature. Therefore, the absence of the recognised terminology and the inclusion of unidiomatic and contextually invalid synonyms was particularly obvious to the readers.

Method of analysis

Identifying the use of paraphrasing tools

It could be argued that the use of synonyms, in particular archaic or unidiomatic words and phrases, is a clear indicator that machine generated paraphrasing has been used. For example, in the papers submitted by students where the use of paraphrasing tools was suspected, the term aboriginal man was substituted with autochthonic person, the hospital became the mending office, the rehabilitation centre the recovery fixate, and the discharge summary the release precis.

In order to investigate the extent to which paraphrasing tools substituted recognised and expected medical terms for unusual synonyms, we selected three essays which we had identified as particularly unusual. We did not know the provenance of these essays, although there was structural evidence that they might have arisen from a single seed document which was an essay submitted by one student in the current cohort.

Table 1 shows the variation from the expected nomenclature.
Table 1

Synonyms used in essays submitted by students suspected of using paraphrasing tools

Expected terminology

Student 1

Student 2

Student 3

Hospital

Healing facility

 

Healing facility

Healing centre

  

Doctor’s facility

Doctor’s facility

 

Health professionals

Human services experts

Healing centre staff individuals

 

Wellbeing experts

  

Emergency departments

Crisis office

Crisis group

Crisis office

Crisis division

 

Crisis division

  

Emergency office

  

Emergency division

Nurse

Medical attendant

Attendant

Medical attendant

Nurse Unit Manager

Medical attendant unit supervisor

Attendant unit administrator

Administration officer

Rehabilitation centre

Recovery focus

Recovery focus

Recuperation centre

Restoration focus

Restoration focus

Restoration focus

Recovery fixate

Recovery ward

Rebuilding centre

Insulin shock

Insulin stun

Insulin stun

Insulin stun

Health literacy

Wellbeing proficiency/aptitude

Wellbeing proficiency

 

Health services

Wellbeing administrations

 

Wellbeing administrations

Discharge summary

Release rundown

Release rundown

Release rundown

 

Release report

Blueprint

 

Release synopsis

 

Junior doctor

Understudy specialist

Lesser specialist

Lesser pro

Medical record

 

Medicinal record

Therapeutic record

 

Medicinal report

Therapeutic history

Physiotherapist

Restorative practitioner

  

Acute hospital

 

Recuperation advance

Recuperating office

  

Mending office

Comparing online language translation and paraphrasing tools

Prior to learning of the existence of online paraphrasing tools, we had assumed that students were authoring work in their first language, and then using online translation tools to convert the text to English. Perhaps the most notable and available online free translation tool, Google Translate™, was made available as an online tool in 2006 using a statistical machine translation engine to translate text from one language, via English, on to the target language. In 2016 Google implemented a Neural Machine Translation engine, which has provided a more sophisticated and accurate output (Le and Schuster 2016). Given the idiomatic nature of language, errors may still occur where a word is translated into a synonym which may not be contextually valid.

To investigate the possibility that students had used Google Translate™, the scenario provided as the enquiry-based learning prompt was used as a seed document to ascertain the changes which might occur when paraphrasing tools and Google Translate™ were employed. The scenario (Appendix) was put through a number of paraphrasing tools, and in each case the standard medical terminology was consistently changed. When the scenario was put through Google Translate™, the terminology was changed only rarely.

The scenario document was subject to iterative language translation (Day et al. 2016). The text was entered into Google Translate™ for translation to a language other than English, and this translation was copied and re-entered to a refreshed Google Translate™ page for translation back into English. The target languages used were Arabic, Punjabi, Hindi, Chinese (Simplified), Chinese (Traditional) and Vietnamese. The languages were chosen as they represent the principal first languages of the EAL students enrolled in this subject.

The translations were of a generally good quality, displaying minor errors in tense and pronoun gender, but could be easily comprehended. The most accurate translations were Chinese (Simplified and Traditional) and Vietnamese, and the highest number of errors occurred in Arabic, Hindi and Punjabi. In the latter languages there were more substitutions for standardised health terms (Table 2).
Table 2

Iterative translation through Google Translate™

English

Arabic

Punjabi

Hindi

aChinese

Vietnamese

Handover

delivery

existence

handsover

  

CAT scan

CAT screening

   

CAT screening

Ward

Wing

Suite

branch

   

department

Occupational therapist

Career therapist

    

Discharge summary

 

Discharge brief

  

Summary of the circulation

Aboriginal

 

Original

tribal

  

Emergency Department

   

Education Department

Emergency room

aChinese (simplified) and Chinese (Traditional) produced the same output

The original scenario was then put through six paraphrasing tools selected as the top entries generated by a Google search using the term ‘paraphrasing tools’. This technique follows that used by Rogerson and McCarthy (2017) based on the assumption that students would use a similar search strategy and select the sites listed at the top of the search results (Table 3).

It was not known whether these sites were using the same paraphrasing tool, however, given the multiple outputs available through non-discriminatory synonym substitution, there was ample opportunity for a diverse output.

The results from the output texts were analysed for synonym substitution of recognised and expected medical terminology, and this was compared to the outputs from the iterative language translation through Google Translate™. This technique was used for convenience purposes as the intention was to gain an overall impression of the extent to which medical terms were substituted by paraphrasing tools compared to Google Translate™. As can be seen from Table 4, the proportion of substituted terms was significantly different. From the 21 standard medical terms there were 73 synonyms from the paraphrasing tools and 7 alternative terms from Google Translate™. Blank spaces in the table indicate that no alterative term was generated by Google Translate™.
Table 4

Comparison of synonyms for medical terms generated by paraphrasing tools and iterative language translation through Google Translate™

Standard nomenclature

Paraphrased term

Google Translate™

Aboriginal man

Native fellow

Autochthonic person

Native man

Native individuals

Original man

Tribal man

Emergency Department

Crisis centre

Crisis division

Crisis office

Crisis group ward

Emergency office

Emergency division

Crisis branch

Emergency branch

Erectile dysfunction (in reference to the abbreviation ED)

Education department

Emergency room

Type 1 (Diabetes Mellitus)

Sort 1

Kind 1

 

Insulin shock

Insulin stun

Insulin surprise

 

Patient centred care

Tolerant focussed care

 

Hospital

Healing facility

Doctor’s facility

Healing centre

Mending office

sanatorium

 

Health professionals

Wellbeing experts

Fitness experts

 

Health literacy

Wellbeing education

Welling proficiency aptitudes

Wellbeing education abilities

 

Rehabilitation centre

Restoration focus

Recovery fixate

Rebuilding centre

Recuperating office

Renewal centre

Restoration centre

Recovery centre

Recovery focus

 

Discharge summary

Release rundown

Release report

Release synopsis

Discharge papers

A reference

Discharge report

Release outline

Discharge outline

Discharge precis

Release summary

Release precis

Discharge brief

Summary of the circulation

Intuitive decision making

Instinctive focussed leadership

Instinctive basic leadership

 

Nurse

Medical caretaker

Medical attendant

caretaker

 

Ambulance

Rescue vehicle

Auto

Car

Affected person delivery

motorcar

 

Rehabilitation

Encourage treatment

 

Social worker

Social specialist

Public servant

Social employee

 

Occupational Therapist

Word related authority

Word related advisor

Activity expert

 

CAT scan

Feline output

ikon

CAT screening

Healthcare team

Medicinal services group

Aid team

 

Diabetes

Polygenic disease

 

Glucose levels

Aldohexose levels

 

Medical record

Anamnesis

Scientific facts

 

Discussion

Although it is not within the scope of this brief exploratory study to state that there is a measurable difference in synonym substitution between paraphrasing tools and Google Translate™, the above results give a general indication of the observable differences.

When determining whether there is a potential breach in academic integrity, it is important to distinguish between extremely poor English skills, the use of a LOTE-to-English translation device, and the generation of text through a paraphrasing tool. Carter and Inkpen (2012, p.49) note “Machine translated text often seems to be intuitively identifiable by proficient speakers of a language”. If a student has used paraphrasing tools to alter a text to evade detection of plagiarism, then that act of evasion suggests that plagiarism has occurred. Word matching software such as Turnitin® (n.d.) has proven valuable in identifying replication of text from other sources. However, the very purpose of paraphrasing tools is to deceive software developed to detect plagiarism, and it is apparent that to date this strategy has been successful (Lancaster and Clarke 2009; Rogerson and McCarthy 2017; Shahid et al. 2017). Consequently, the burden of detection remains with the human reader who has to become increasing adept at spotting stylistic variations and any other flags relating to mechanisms that have been used to avoid detection (Gillam et al. 2010).

The method of detection we suggest, identifying the absence of expected nomenclature such as discipline based terminology, could be considered an extrinsic analysis of the text. The expected immutables of recognised medical terms are substituted with synonyms, and thus treated by the paraphrasing tools as mutables. The paraphrased text is compared to an ideal or external text, that is, the text containing the medical terminology which was expected by the assessor. Shahid et al. (2017) propose a method of intrinsic analysis of paraphrased text through stylometric analysis:

We observe that style, language, grammatical constructs, and certain linguistic expressions in spun documents deviate from a human author because spinning software introduce artefacts in their output which are specific to a text spinner. (p. 5)

The technique described in their study involves the application of a number of algorithms to a selected text which can lead to identification of the source text. This level of analysis is not currently available to academic staff seeking to identify plagiarism committed through the use of paraphrasing tools. However, Turnitin® (n.d.) is developing an Authorship Investigation tool which will use stylometric and forensic linguistic analysis to provide measurement parameters indicative of authorship of a text (https://www.turnitin.com/solutions/authorship-investigation,). Where there is suspicion that contract cheating has occurred, the Authorship Investigation tool will use examples of previous work submitted by a student to ascertain similarity of stylistic features to the work under suspicion. The premise is that a stylometric ‘fingerprint’ of the student’s literary style and expression can be used for comparison to submissions which may have been outsourced to another author. It is anticipated that this tool will be of potentially useful in determining whether a submission has hallmarks which distinguish it from other pieces of writing by the student, but it will not be possible to identify the author of the outsourced work.

In this exploratory study we identified linguistic features of spun text which indicated the use of paraphrasing tools. However, we were reliant on the curious case of the blessed messengers to point towards collusion. This was achieved through close collaboration by the marking staff, and until techniques for reverse engineering of paraphrased text become more widely available, “What ultimately leads to determinations of plagiarism is considerable manual analysis and subjective judgement” (Bretag and Mahmud 2009, p.54).

Conclusion

Students, and in particular those from an EAL background, experience significant challenges in conforming to academic conventions such as paraphrasing. The availability of free online paraphrasing tools may appear to them as a realistic solution to these challenges despite the word salad which is created by these tools. Whereas EAL students who write original work in their first language and then use online translation tools to convert this to English may be demonstrating poor academic practice, it can be argued that the submitted work is a result of their own intellectual endeavours. Unfortunately, students who use paraphrasing tools to spin text from undisclosed sources, thus evading word matching software, have committed an overt act of academic dishonesty.

In academic writing in the health science discipline, there is an expectation that standard medical terminology will be used. We noted that absence of this in the students’ submissions and investigated the outputs of both paraphrasing tools and Google Translate™. We noted that paraphrasing tools are significantly more likely to substitute inappropriate synonyms for accepted medical nomenclature, whereas Google Translate™ largely preserved these terms intact.

When paraphrasing tools have been applied to text the output is frequently of such poor quality as to render the text unintelligible. We also noted the following features: the language generated will be notable for the use of unidiomatic words and phrases; expected vocabulary such as standard medical terminology will usually be substituted with inappropriate synonyms; word matching software, such as Turnitin® (n.d.), may not recognise the re-engineered text from the source and thus provide a low similarity index which may not be indicative of the actual level of plagiarism.

When using online translation tools, such as Google Translate™, to convert text from a language other than English to English, there is less likelihood that discipline specific nomenclature, such as standard medical terminology, will be changed to the same extent as paraphrasing tools.

This study demonstrates that there are a number of distinct features which can be identified in the text generated by paraphrasing tools. Awareness of these features will assist in the process of detecting plagiarism. While the emphasis should be on supporting students to develop the skills required to paraphrase appropriately, identifying linguistic markers which provide evidence of the use of paraphrasing tools will be of benefit in the overall management of breaches of academic integrity.

Notes

Funding

No funding was sought or provided for this study.

Availability of data and materials

No data or materials outside of those presented in the manuscript are held by the authors.

Authors’ contributions

FP 80%. CK 20%. Both authors have read and approved the final manuscript.

Authors’ information

Not applicable

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. Bailey J (2018) A brief history of article spinning. March 8, 2018 Plagiarism Today. https://www.plagiarismtoday.com/2018/03/08/a-brief-history-of-article-spinning/. Accessed 15 Aug 2018
  2. Bretag T (2007) The emperor's new clothes: yes, there is a link between English language competence and academic standards. People Place 15(1):13Google Scholar
  3. Bretag T, Mahmud S (2009) A model for determining student plagiarism: electronic detection and academic judgement. J Univ Teach Learn Pract 6(1):49–60Google Scholar
  4. Brimble M, Stevenson-Clarke P (2006) Managing academic dishonesty in Australian universities: implications for teaching, learning and scholarship. Account Account Perform 12(1):32–63Google Scholar
  5. Burke D, Sanney K (2018) Applying the fraud triangle to higher education: ethical implications. J Leg Stud Educ 35(1):5–43CrossRefGoogle Scholar
  6. Carroll J (2015) Making decisions on management of plagiarism cases where there is a deliberate attempt to cheat. In: Bretag T (ed) Handbook of academic integrity. Springer, Singapore, pp 567–622Google Scholar
  7. Carter D, Inkpen D (2012) Searching for poor quality machine translated text: learning the difference between human writing and machine translations. In: Canadian conference on artificial intelligence. Springer, Heidelberg, pp 49–60Google Scholar
  8. Coren A (2011) Turning a blind eye: faculty who ignore student cheating. J Acad Ethics 9(4):291–305CrossRefGoogle Scholar
  9. Correa M (2011) Academic dishonesty in the second language classroom: instructors’ perspectives. Mod J Lang Teach Methods 1(1):65–79Google Scholar
  10. Curtis GJ, Vardanega L (2016) Is plagiarism changing over time? A 10-year time-lag study with three points of measurement. High Educ Res Dev 35(6):1167–1179CrossRefGoogle Scholar
  11. Day S, Williams H, Shelton J, Dozier G (2016) Towards the development of a Cyber Analysis & Advisement Tool (CAAT) for mitigating de-anonymization attacks. Paper presented at the 27th modern artificial intelligence and cognitive science conference, Dayton, Ohio, USA, April 22–23, 2016.Google Scholar
  12. Dingley C, Daugherty K, Derieg MK, Persing R (2008) Improving patient safety through provider communication strategy enhancements. In: Advances in Patient Safety: New Directions and Alternative Approaches, vol 3. Agency for Healthcare Research and Quality, USAGoogle Scholar
  13. Gillam L, Marinuzzi J, Ioannou P (2010) Turnitoff-defeating plagiarism detection systems. Subject Centre for Information and Computer Sciences. Available via http://epubs.surrey.ac.uk/790662/2/HEA-ICS_turnitoff.pdf. Accessed 16 Sept 2018Google Scholar
  14. Google Translate™ (n.d.) https://translate.google.com.au. Accessed 16 Aug 2018
  15. Handa N, Power C (2005) Land and discover! A case study investigating the cultural context of plagiarism. J Univ Teach Learn Pract 2(3):8Google Scholar
  16. Hyland K (2006) English for academic purposes. An advanced resource book. Routledge, LondonGoogle Scholar
  17. Howard RM (1999) Standing in the shadow of giants: Plagiarists, authors, collaborators (No. 2). Greenwood Publishing Group.Google Scholar
  18. Jamieson S (2015) Is it plagiarism or patchwriting? Toward a nuanced definition. In: Bretag T (ed) Handbook of academic integrity. Springer, SingaporeGoogle Scholar
  19. Keith-Spiegel P, Tabachnick B, Whitley B, Washburn J (1998) Why professors ignore cheating: opinions of a national sample of psychology instructors. Ethics Behav 8(3):215–227CrossRefGoogle Scholar
  20. Lancaster T, Clarke R (2006) Eliminating the successor to plagiarism? Identifying the usage of contract cheating sites. In: Proceedings of 2nd international plagiarism conferenceGoogle Scholar
  21. Lancaster T, Clarke R (2009) Automated essay spinning–an initial investigation, Paper presented at the 10th annual conference of the subject Centre for Information and Computer SciencesGoogle Scholar
  22. Le Q, Schuster M (2016) A neural network for machine translation, at production scale. Google AI Blog. Available via https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html
  23. Marshall S, Garry M (2006) NESB and ESB students’ attitudes and perceptions of plagiarism. Int J Educ Integr 2(1):26–37Google Scholar
  24. Monsen K, Honey M, Wilson S (2010) Meaningful use of a standardized terminology to support the electronic health record in New Zealand. Appl Clin Inform 1(4):368CrossRefGoogle Scholar
  25. Mundt K, Groves M (2016) A double-edged sword: the merits and the policy implications of Google translate™ in higher education. Eur J High Educ 6(4):387–401CrossRefGoogle Scholar
  26. Paraphrasing Online (n.d.) https://www.paraphrasingonline.com. Accessed 14 Aug 2018
  27. Paraphrasing Tool (n.d.) https://paraphrasing-tool.com. Accessed 14 Aug 2018
  28. Pearson A, Aromataris E (2009) Patient safety in primary healthcare: a review of the literature. Australian Commission on Safety and Quality in Health Care, AdelaideGoogle Scholar
  29. Pecorari D (2003) Good and original: plagiarism and patchwriting in academic second-language writing. J Second Lang Writ 12(4):317–345CrossRefGoogle Scholar
  30. Plagiarisma (n.d.) http://plagiarisma.net/spinner.php. Accessed 14 Aug 2018
  31. PrePostSEO (n.d.) https://www.prepostseo.com/free-online-paraphrasing-tool. Accessed 14 Aug 2018
  32. Rewriter Tools (n.d.) https://www.rewritertools.com/paraphrasing-tool. Accessed 14 Aug 2018
  33. Rogerson AM, McCarthy G (2017) Using internet based paraphrasing tools: original work, patchwriting or facilitated plagiarism? Int J Educ Integr 13(1):2CrossRefGoogle Scholar
  34. SEOMagnifier (n.d.) https://seomagnifier.com/online-paraphrasing-tool Accessed 14 Aug 2018
  35. Shahid U, Farooqi S, Ahmad R, Shafiq Z, Srinivasan P, Zaffar F (2017) Accurate Detection of Automatically Spun Content via Stylometric Analysis. In: Data Mining (ICDM), 2017 IEEE International Conference. IEEE, New OrleansGoogle Scholar
  36. Shi L (2006) Cultural backgrounds and textual appropriation. Lang Aware 15(4):264–282CrossRefGoogle Scholar
  37. Sutherland-Smith W (2005) Pandora’s box: academic perceptions of student plagiarism in writing. J Engl Acad Purp 4(1):83–95CrossRefGoogle Scholar
  38. Thomas A, De Bruin G (2012) Student academic dishonesty: what do academics think and do, and what are the barriers to action? Afr J Bus Ethics 6(1):13–24CrossRefGoogle Scholar
  39. Turnitin® (n.d.), Introducing Authorship Investigation. https://www.turnitin.com/solutions/authorship-investigation. Accessed 21 Sept 2018
  40. Zhang Q, Wang DY, Voelker GM (2014) DSpin: detecting automatically spun content on the web. NDSS  https://doi.org/10.14722/ndss.2014.23004

Copyright information

© The Author(s) 2018

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.La Trobe College AustraliaBundooraAustralia

Personalised recommendations