Skip to main content

Advertisement

Log in

Detection of GPT-4 Generated Text in Higher Education: Combining Academic Judgement and Software to Identify Generative AI Tool Misuse

  • Published:
Journal of Academic Ethics Aims and scope Submit manuscript

Abstract

This study explores the capability of academic staff assisted by the Turnitin Artificial Intelligence (AI) detection tool to identify the use of AI-generated content in university assessments. 22 different experimental submissions were produced using Open AI’s ChatGPT tool, with prompting techniques used to reduce the likelihood of AI detectors identifying AI-generated content. These submissions were marked by 15 academic staff members alongside genuine student submissions. Although the AI detection tool identified 91% of the experimental submissions as containing AI-generated content, only 54.8% of the content was identified as AI-generated, underscoring the challenges of detecting AI content when advanced prompting techniques are used. When academic staff members marked the experimental submissions, only 54.5% were reported to the academic misconduct process, emphasising the need for greater awareness of how the results of AI detectors may be interpreted. Similar performance in grades was obtained between student submissions and AI-generated content (AI mean grade: 52.3, Student mean grade: 54.4), showing the capabilities of AI tools in producing human-like responses in real-life assessment situations. Recommendations include adjusting the overall strategies for assessing university students in light of the availability of new Generative AI tools. This may include reducing the overall reliance on assessments where AI tools may be used to mimic human writing, or by using AI-inclusive assessments. Comprehensive training must be provided for both academic staff and students so that academic integrity may be preserved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availability

The full commentaries left by academic staff members when marking the test submissions cannot be shared because of the confidentiality of the assessment results and associated comments. All other data generated or analysed during this study are included in this published article.

References

Download references

Acknowledgements

The authors are very grateful for the support of the academic staff who participated in the study and the ongoing support of the university Examinations Office team who provided technical assistance throughout the project.

Funding

No funding was received for this study.

Author information

Authors and Affiliations

Authors

Contributions

Mike Perkins conceived and designed the study. Data collection and analysis were performed by the authors Mike Perkins, Darius Postma, James McGaughran, and Don Hickerson. The first draft of the manuscript was written by Mike Perkins, and all authors subsequently revised the manuscript. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Mike Perkins.

Ethics declarations

Competing interests

Mike Perkins, Darius Postma, James McGaughran, and Don Hickerson are currently employed by the university where the study took place. Jasper Roe was previously employed at the same university. This study was not connected to or funded by Turnitin.

Ethics Approval and Consent to Participate

The study was approved by the institution’s Human Ethics Committee before commencement, and all participants provided informed consent with the option to opt out of the study at any point in time.

LLM Usage

This study used Generative AI tools to produce draft text, and revise wording throughout the production of the manuscript. Multiple versions of ChatGPT over different time periods were used, with all versions using the underlying GPT-4 Large Language Model. The authors reviewed, edited, and take responsibility for all outputs of the tools used in this study.

Preprint Publication

The initial version of this manuscript prior to peer-review was posted on the arXiv preprint server, and is available on a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) licence at https://arxiv.org/abs/2305.18081.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Perkins, M., Roe, J., Postma, D. et al. Detection of GPT-4 Generated Text in Higher Education: Combining Academic Judgement and Software to Identify Generative AI Tool Misuse. J Acad Ethics 22, 89–113 (2024). https://doi.org/10.1007/s10805-023-09492-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10805-023-09492-6

Keywords

Navigation