A New Dataset and Method for Creativity Assessment Using the Alternate Uses Task

Sun, Luning; Gu, Hongyi; Myers, Rebecca; Yuan, Zheng

doi:10.1007/978-981-97-0065-3_9

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 2036))

Included in the following conference series:

BenchCouncil International Symposium on Intelligent Computers, Algorithms, and Applications

285 Accesses

Abstract

Creativity ratings by humans for the alternate uses task (AUT) tend to be subjective and inefficient. To automate the scoring process of the AUT, previous literature suggested using semantic distance from non-contextual models. In this paper, we extend this line of research by including contextual semantic models and more importantly, exploring the feasibility of predicting creativity ratings with supervised discriminative machine learning models. Based on a newly collected dataset, our results show that supervised models can successfully classify between creative and non-creative responses even with unbalanced data, and can generalise well to out-of-domain unseen prompts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 64.99; Price excludes VAT (USA)

Softcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://chat.openai.com/.
2.
https://github.com/ghydsgaaa/Cambridge-AUT-dataset.
3.
https://discovermyprofile.com/.
4.
Participants were not paid but given the opportunity to opt into a draw to win one of ten £10 Amazon vouchers.
5.
https://tfhub.dev/google/universal-sentence-encoder/4.
6.
https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.
7.
https://huggingface.co/distilroberta-base.
8.
https://beta.openai.com/docs/models/gpt-3.
9.
glove-wiki-gigaword-300.
10.
word2vec-google-news-300.
11.
fasttext-wiki-news-subwords-300.
12.
CFA is a statistical technique used to verify the factor structure of a set of observed variables and test if the relationship between observed variables and their underlying latent constructs exist.
13.
Detailed CFA results are presented in Table 6, Sect. B.
14.
https://huggingface.co/bert-base-uncased.
15.
https://huggingface.co/roberta-base.
16.
https://openai.com/blog/openai-api.
17.
Per-class precision, recall and F1 scores are reported in Table 10, Sect. D.
18.
The prompt we used for experiments with ChatGPT is provided in Sect. E.
19.
Per-class precision, recall and F1 scores are reported in Table 11 and Table 12, Sect. F.
20.
One viable solution is employing a voting ensemble technique, which involves assigning weights to results of both models and striking a balance between precision and recall. Alternatively, we could prompt ChatGPT to generate quantified results and establish a threshold for comparing its outputs with those of the fine-tuned models.

References

Amabile, T.M.: Social psychology of creativity: a consensual assessment technique. J. Pers. Soc. Psychol. 43(5), 997–1013 (1982)
Article Google Scholar
Amabile, T.M.: The social psychology of creativity: a componential conceptualization. J. Pers. Soc. Psychol. 45(2), 357–376 (1983)
Article Google Scholar
Ananiadou, K., Claro, M.: 21st century skills and competences for new millennium learners in OECD countries. In: OECD Education Working Papers (41) (2009). https://www.oecd-ilibrary.org/content/paper/218525261154
Andersen, Ø.E., Yuan, Z., Watson, R., Cheung, K.Y.F.: Benefits of alternative evaluation methods for automated essay scoring. In: Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), Paris, France (2021)
Google Scholar
Beaty, R.E., Johnson, D.R.: Automating creativity assessment with SemDis: an open platform for computing semantic distance. Behav. Res. Methods 53(2), 757–780 (2021)
Article Google Scholar
Beaty, R.E., et al.: Robust prediction of individual creative ability from brain functional connectivity. Proc. Natl. Acad. Sci. 115(5), 1087–1092 (2018)
Article Google Scholar
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Article Google Scholar
Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)
Google Scholar
Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)
Cseh, G.M., Jeffries, K.K.: A scattered CAT: a critical evaluation of the consensual assessment technique for creativity research. Psychol. Aesthet. Creat. Arts 13(2), 159–166 (2019)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dumas, D., Organisciak, P., Doherty, M.: Measuring divergent thinking originality with human raters and text-mining models: a psychometric comparison of methods. Psychol. Aesthet. Creat. Arts 15(4), 645–663 (2021)
Article Google Scholar
George, J.M., Zhou, J.: Dual tuning in a supportive context: joint contributions of positive mood, negative mood, and supervisory behaviors to employee creativity. Acad. Manag. J. 50(3), 605–622 (2007). https://doi.org/10.5465/AMJ.2007.25525934
Guilford, J.P.: The Nature of Human Intelligence. McGraw-Hill, New York, NY (1967)
Google Scholar
Guilford, J.P.: Creative Talents: Their Nature, Uses and Development. Bearly Limited, Buffalo, NY (1986)
Google Scholar
Ivancovsky, T., Shamay-Tsoory, S., Lee, J., Morio, H., Kurman, J.: A dual process model of generation and evaluation: a theoretical framework to examine cross-cultural differences in the creative process. Personal. Individ. Differ. 139, 60–68 (2019)
Google Scholar
Kim, K.H.: Can we trust creativity tests? A review of the Torrance tests of creative thinking (TTCT). Creat. Res. J. 18(1), 3–14 (2006). https://doi.org/10.1207/s15326934crj1801_2
Kocmi, T., Federmann, C.: Large language models are state-of-the-art evaluators of translation quality. arXiv preprint arXiv:2302.14520 (2023)
Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
McCrae, R.R.: Creativity, divergent thinking, and openness to experience. J. Pers. Soc. Psychol. 52(6), 1258–1265 (1987)
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26 (2013)
Google Scholar
Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)
Article Google Scholar
Mouchiroud, C., Lubart, T.: Children’s original thinking: an empirical examination of alternative measures derived from divergent thinking tasks. J. Genet. Psychol. 162(4), 382–401 (2001)
Article Google Scholar
Mumford, M.D.: Where have we been, where are we going? Taking stock in creativity research. Creat. Res. J. 15(2–3), 107–120 (2003)
Google Scholar
Myers, R.J.: Measuring creative potential in higher education: the development and validation of a new psychometric test (2020). Unpublished Master’s dissertation, University of Cambridge
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)
Google Scholar
Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Sottana, A., Liang, B., Zou, K., Yuan, Z.: Evaluation metrics in the era of GPT-4: reliably evaluating large language models on sequence to sequence tasks. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2023)
Google Scholar
Susnea, I., Pecheanu, E., Costache, S.: Challenges of an e-learning platform for teaching creativity. In: Proceedings of the 11th International Scientific Conference eLearning and Software for Education. Bucharest, Romania, April 2015
Google Scholar
Torrance, E.P.: Torrance Tests of Creative Thinking - Norms Technical Manual Research Edition - Verbal Tests, Forms A and B - Figural Tests, Forms A and B. Personnel Press, Princeton, NJ (1966)
Google Scholar
Zhuo, T.Y.: Large language models are state-of-the-art evaluators of code generation (2023)
Google Scholar

Download references

Acknowledgement

We would like to thank all participants who took part in the AUT and all raters who annotated the responses. LS acknowledges financial support from Invesco through their philanthropic donation to Cambridge Judge Business School.

Author information

Authors and Affiliations

University of Cambridge, Cambridge, UK
Luning Sun & Rebecca Myers
NetMind.AI, London, UK
Hongyi Gu & Zheng Yuan
King’s College London, London, UK
Zheng Yuan

Authors

Luning Sun
View author publications
You can also search for this author in PubMed Google Scholar
Hongyi Gu
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca Myers
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zheng Yuan .

Editor information

Editors and Affiliations

Université de Bourgogne, Dijon, France
Christophe Cruz
Victoria University, Melbourne, VIC, Australia
Yanchun Zhang
Chinese Academy of Sciences, Beijing, China
Wanling Gao

Appendices

A A The Instructions Used for the AUT

General instruction: For the next four questions, there will be a time limit. For each task, please read the instructions and enter each possible answer separately by pressing the enter key after each one. If you run out of answers you may move on by pressing the next button, otherwise your question will automatically change after the allocated time.

Each task requires you to come up with as many different answers as possible. Try to be creative as there is no right or wrong answer.

Prompt 1: List as many different uses of a bowl as you can think of.

Prompt 2: Think of many different uses of a paperclip.

B B Detailed CFA Results

Table 6. Latent correlations between human creativity ratings and semantic distance factors (Model\(_{\textbf {non-contextual}}\) and Model\(_{\textbf {contextual}}\)) on the Cambridge AUT dataset.

Full size table

C C Cross Validation Results

Table 7. Fine-tuned BERT cross validation results on the Cambridge AUT training sets. P: precision; R: recall.

Full size table

Table 8. Fine-tuned RoBERTa cross validation results on the Cambridge AUT training sets. P: precision; R: recall.

Full size table

Table 9. Fine-tuned GPT-3 babbage cross validation results on the Cambridge AUT training sets. P: precision; R: recall.

Full size table

Table 10. Prediction performance on the dataset from [6]. P: precision; R: recall.

Full size table

D D Model Performance on the Dataset from [6]

E E ChatGPT Prompt

You are a judge in the alternate uses task, where respondents are asked to list different uses for a common object. You will be presented with the object and a response that illustrates one of its uses. Please judge if the response is creative or non-creative. Inappropriate, invalid, irrelevant responses, and responses with common uses are considered non-creative, whereas appropriate, valid, novel and unusual uses are considered creative.

The object is: {prompt}

The response is: {response}

Please give your answer in “creative” or “non-creative”.

Your answer:

F F ChatGPT Classification Results

Table 11. ChatGPT results on the Cambridge AUT dataset. P: precision; R: recall.

Full size table

Table 12. ChatGPT results on the dataset from [6]. P: precision; R: recall.

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sun, L., Gu, H., Myers, R., Yuan, Z. (2024). A New Dataset and Method for Creativity Assessment Using the Alternate Uses Task. In: Cruz, C., Zhang, Y., Gao, W. (eds) Intelligent Computers, Algorithms, and Applications. IC 2023. Communications in Computer and Information Science, vol 2036. Springer, Singapore. https://doi.org/10.1007/978-981-97-0065-3_9

Download citation

DOI: https://doi.org/10.1007/978-981-97-0065-3_9
Published: 28 January 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-0064-6
Online ISBN: 978-981-97-0065-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A New Dataset and Method for Creativity Assessment Using the Alternate Uses Task

Abstract

Access this chapter

Notes

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

A A The Instructions Used for the AUT

B B Detailed CFA Results

C C Cross Validation Results

D D Model Performance on the Dataset from [6]

E E ChatGPT Prompt

F F ChatGPT Classification Results

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation