Abstract
Creativity ratings by humans for the alternate uses task (AUT) tend to be subjective and inefficient. To automate the scoring process of the AUT, previous literature suggested using semantic distance from non-contextual models. In this paper, we extend this line of research by including contextual semantic models and more importantly, exploring the feasibility of predicting creativity ratings with supervised discriminative machine learning models. Based on a newly collected dataset, our results show that supervised models can successfully classify between creative and non-creative responses even with unbalanced data, and can generalise well to out-of-domain unseen prompts.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
Participants were not paid but given the opportunity to opt into a draw to win one of ten £10 Amazon vouchers.
- 5.
- 6.
- 7.
- 8.
- 9.
glove-wiki-gigaword-300.
- 10.
word2vec-google-news-300.
- 11.
fasttext-wiki-news-subwords-300.
- 12.
CFA is a statistical technique used to verify the factor structure of a set of observed variables and test if the relationship between observed variables and their underlying latent constructs exist.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
The prompt we used for experiments with ChatGPT is provided in Sect. E.
- 19.
- 20.
One viable solution is employing a voting ensemble technique, which involves assigning weights to results of both models and striking a balance between precision and recall. Alternatively, we could prompt ChatGPT to generate quantified results and establish a threshold for comparing its outputs with those of the fine-tuned models.
References
Amabile, T.M.: Social psychology of creativity: a consensual assessment technique. J. Pers. Soc. Psychol. 43(5), 997–1013 (1982)
Amabile, T.M.: The social psychology of creativity: a componential conceptualization. J. Pers. Soc. Psychol. 45(2), 357–376 (1983)
Ananiadou, K., Claro, M.: 21st century skills and competences for new millennium learners in OECD countries. In: OECD Education Working Papers (41) (2009). https://www.oecd-ilibrary.org/content/paper/218525261154
Andersen, Ø.E., Yuan, Z., Watson, R., Cheung, K.Y.F.: Benefits of alternative evaluation methods for automated essay scoring. In: Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), Paris, France (2021)
Beaty, R.E., Johnson, D.R.: Automating creativity assessment with SemDis: an open platform for computing semantic distance. Behav. Res. Methods 53(2), 757–780 (2021)
Beaty, R.E., et al.: Robust prediction of individual creative ability from brain functional connectivity. Proc. Natl. Acad. Sci. 115(5), 1087–1092 (2018)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)
Cseh, G.M., Jeffries, K.K.: A scattered CAT: a critical evaluation of the consensual assessment technique for creativity research. Psychol. Aesthet. Creat. Arts 13(2), 159–166 (2019)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dumas, D., Organisciak, P., Doherty, M.: Measuring divergent thinking originality with human raters and text-mining models: a psychometric comparison of methods. Psychol. Aesthet. Creat. Arts 15(4), 645–663 (2021)
George, J.M., Zhou, J.: Dual tuning in a supportive context: joint contributions of positive mood, negative mood, and supervisory behaviors to employee creativity. Acad. Manag. J. 50(3), 605–622 (2007). https://doi.org/10.5465/AMJ.2007.25525934
Guilford, J.P.: The Nature of Human Intelligence. McGraw-Hill, New York, NY (1967)
Guilford, J.P.: Creative Talents: Their Nature, Uses and Development. Bearly Limited, Buffalo, NY (1986)
Ivancovsky, T., Shamay-Tsoory, S., Lee, J., Morio, H., Kurman, J.: A dual process model of generation and evaluation: a theoretical framework to examine cross-cultural differences in the creative process. Personal. Individ. Differ. 139, 60–68 (2019)
Kim, K.H.: Can we trust creativity tests? A review of the Torrance tests of creative thinking (TTCT). Creat. Res. J. 18(1), 3–14 (2006). https://doi.org/10.1207/s15326934crj1801_2
Kocmi, T., Federmann, C.: Large language models are state-of-the-art evaluators of translation quality. arXiv preprint arXiv:2302.14520 (2023)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
McCrae, R.R.: Creativity, divergent thinking, and openness to experience. J. Pers. Soc. Psychol. 52(6), 1258–1265 (1987)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)
Mouchiroud, C., Lubart, T.: Children’s original thinking: an empirical examination of alternative measures derived from divergent thinking tasks. J. Genet. Psychol. 162(4), 382–401 (2001)
Mumford, M.D.: Where have we been, where are we going? Taking stock in creativity research. Creat. Res. J. 15(2–3), 107–120 (2003)
Myers, R.J.: Measuring creative potential in higher education: the development and validation of a new psychometric test (2020). Unpublished Master’s dissertation, University of Cambridge
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Sottana, A., Liang, B., Zou, K., Yuan, Z.: Evaluation metrics in the era of GPT-4: reliably evaluating large language models on sequence to sequence tasks. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2023)
Susnea, I., Pecheanu, E., Costache, S.: Challenges of an e-learning platform for teaching creativity. In: Proceedings of the 11th International Scientific Conference eLearning and Software for Education. Bucharest, Romania, April 2015
Torrance, E.P.: Torrance Tests of Creative Thinking - Norms Technical Manual Research Edition - Verbal Tests, Forms A and B - Figural Tests, Forms A and B. Personnel Press, Princeton, NJ (1966)
Zhuo, T.Y.: Large language models are state-of-the-art evaluators of code generation (2023)
Acknowledgement
We would like to thank all participants who took part in the AUT and all raters who annotated the responses. LS acknowledges financial support from Invesco through their philanthropic donation to Cambridge Judge Business School.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendices
A A The Instructions Used for the AUT
General instruction: For the next four questions, there will be a time limit. For each task, please read the instructions and enter each possible answer separately by pressing the enter key after each one. If you run out of answers you may move on by pressing the next button, otherwise your question will automatically change after the allocated time.
Each task requires you to come up with as many different answers as possible. Try to be creative as there is no right or wrong answer.
Prompt 1: List as many different uses of a bowl as you can think of.
Prompt 2: Think of many different uses of a paperclip.
B B Detailed CFA Results
C C Cross Validation Results
D D Model Performance on the Dataset from [6]
E E ChatGPT Prompt
You are a judge in the alternate uses task, where respondents are asked to list different uses for a common object. You will be presented with the object and a response that illustrates one of its uses. Please judge if the response is creative or non-creative. Inappropriate, invalid, irrelevant responses, and responses with common uses are considered non-creative, whereas appropriate, valid, novel and unusual uses are considered creative.
The object is: {prompt}
The response is: {response}
Please give your answer in “creative” or “non-creative”.
Your answer:
F F ChatGPT Classification Results
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Sun, L., Gu, H., Myers, R., Yuan, Z. (2024). A New Dataset and Method for Creativity Assessment Using the Alternate Uses Task. In: Cruz, C., Zhang, Y., Gao, W. (eds) Intelligent Computers, Algorithms, and Applications. IC 2023. Communications in Computer and Information Science, vol 2036. Springer, Singapore. https://doi.org/10.1007/978-981-97-0065-3_9
Download citation
DOI: https://doi.org/10.1007/978-981-97-0065-3_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0064-6
Online ISBN: 978-981-97-0065-3
eBook Packages: Computer ScienceComputer Science (R0)