Skip to main content

A New Dataset and Method for Creativity Assessment Using the Alternate Uses Task

  • Conference paper
  • First Online:
Intelligent Computers, Algorithms, and Applications (IC 2023)

Abstract

Creativity ratings by humans for the alternate uses task (AUT) tend to be subjective and inefficient. To automate the scoring process of the AUT, previous literature suggested using semantic distance from non-contextual models. In this paper, we extend this line of research by including contextual semantic models and more importantly, exploring the feasibility of predicting creativity ratings with supervised discriminative machine learning models. Based on a newly collected dataset, our results show that supervised models can successfully classify between creative and non-creative responses even with unbalanced data, and can generalise well to out-of-domain unseen prompts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://chat.openai.com/.

  2. 2.

    https://github.com/ghydsgaaa/Cambridge-AUT-dataset.

  3. 3.

    https://discovermyprofile.com/.

  4. 4.

    Participants were not paid but given the opportunity to opt into a draw to win one of ten £10 Amazon vouchers.

  5. 5.

    https://tfhub.dev/google/universal-sentence-encoder/4.

  6. 6.

    https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.

  7. 7.

    https://huggingface.co/distilroberta-base.

  8. 8.

    https://beta.openai.com/docs/models/gpt-3.

  9. 9.

    glove-wiki-gigaword-300.

  10. 10.

    word2vec-google-news-300.

  11. 11.

    fasttext-wiki-news-subwords-300.

  12. 12.

    CFA is a statistical technique used to verify the factor structure of a set of observed variables and test if the relationship between observed variables and their underlying latent constructs exist.

  13. 13.

    Detailed CFA results are presented in Table 6, Sect. B.

  14. 14.

    https://huggingface.co/bert-base-uncased.

  15. 15.

    https://huggingface.co/roberta-base.

  16. 16.

    https://openai.com/blog/openai-api.

  17. 17.

    Per-class precision, recall and F1 scores are reported in Table 10, Sect. D.

  18. 18.

    The prompt we used for experiments with ChatGPT is provided in Sect. E.

  19. 19.

    Per-class precision, recall and F1 scores are reported in Table 11 and Table 12, Sect. F.

  20. 20.

    One viable solution is employing a voting ensemble technique, which involves assigning weights to results of both models and striking a balance between precision and recall. Alternatively, we could prompt ChatGPT to generate quantified results and establish a threshold for comparing its outputs with those of the fine-tuned models.

References

  1. Amabile, T.M.: Social psychology of creativity: a consensual assessment technique. J. Pers. Soc. Psychol. 43(5), 997–1013 (1982)

    Article  Google Scholar 

  2. Amabile, T.M.: The social psychology of creativity: a componential conceptualization. J. Pers. Soc. Psychol. 45(2), 357–376 (1983)

    Article  Google Scholar 

  3. Ananiadou, K., Claro, M.: 21st century skills and competences for new millennium learners in OECD countries. In: OECD Education Working Papers (41) (2009). https://www.oecd-ilibrary.org/content/paper/218525261154

  4. Andersen, Ø.E., Yuan, Z., Watson, R., Cheung, K.Y.F.: Benefits of alternative evaluation methods for automated essay scoring. In: Proceedings of the 14th International Conference on Educational Data Mining (EDM 2021), Paris, France (2021)

    Google Scholar 

  5. Beaty, R.E., Johnson, D.R.: Automating creativity assessment with SemDis: an open platform for computing semantic distance. Behav. Res. Methods 53(2), 757–780 (2021)

    Article  Google Scholar 

  6. Beaty, R.E., et al.: Robust prediction of individual creative ability from brain functional connectivity. Proc. Natl. Acad. Sci. 115(5), 1087–1092 (2018)

    Article  Google Scholar 

  7. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  8. Brown, T., et al.: Language models are few-shot learners. In: Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901 (2020)

    Google Scholar 

  9. Cer, D., et al.: Universal sentence encoder. arXiv preprint arXiv:1803.11175 (2018)

  10. Cseh, G.M., Jeffries, K.K.: A scattered CAT: a critical evaluation of the consensual assessment technique for creativity research. Psychol. Aesthet. Creat. Arts 13(2), 159–166 (2019)

    Article  Google Scholar 

  11. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  12. Dumas, D., Organisciak, P., Doherty, M.: Measuring divergent thinking originality with human raters and text-mining models: a psychometric comparison of methods. Psychol. Aesthet. Creat. Arts 15(4), 645–663 (2021)

    Article  Google Scholar 

  13. George, J.M., Zhou, J.: Dual tuning in a supportive context: joint contributions of positive mood, negative mood, and supervisory behaviors to employee creativity. Acad. Manag. J. 50(3), 605–622 (2007). https://doi.org/10.5465/AMJ.2007.25525934

  14. Guilford, J.P.: The Nature of Human Intelligence. McGraw-Hill, New York, NY (1967)

    Google Scholar 

  15. Guilford, J.P.: Creative Talents: Their Nature, Uses and Development. Bearly Limited, Buffalo, NY (1986)

    Google Scholar 

  16. Ivancovsky, T., Shamay-Tsoory, S., Lee, J., Morio, H., Kurman, J.: A dual process model of generation and evaluation: a theoretical framework to examine cross-cultural differences in the creative process. Personal. Individ. Differ. 139, 60–68 (2019)

    Google Scholar 

  17. Kim, K.H.: Can we trust creativity tests? A review of the Torrance tests of creative thinking (TTCT). Creat. Res. J. 18(1), 3–14 (2006). https://doi.org/10.1207/s15326934crj1801_2

  18. Kocmi, T., Federmann, C.: Large language models are state-of-the-art evaluators of translation quality. arXiv preprint arXiv:2302.14520 (2023)

  19. Liu, Y., et al.: RoBERTa: a robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  20. McCrae, R.R.: Creativity, divergent thinking, and openness to experience. J. Pers. Soc. Psychol. 52(6), 1258–1265 (1987)

    Article  Google Scholar 

  21. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, vol. 26 (2013)

    Google Scholar 

  22. Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)

    Article  Google Scholar 

  23. Mouchiroud, C., Lubart, T.: Children’s original thinking: an empirical examination of alternative measures derived from divergent thinking tasks. J. Genet. Psychol. 162(4), 382–401 (2001)

    Article  Google Scholar 

  24. Mumford, M.D.: Where have we been, where are we going? Taking stock in creativity research. Creat. Res. J. 15(2–3), 107–120 (2003)

    Google Scholar 

  25. Myers, R.J.: Measuring creative potential in higher education: the development and validation of a new psychometric test (2020). Unpublished Master’s dissertation, University of Cambridge

    Google Scholar 

  26. Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

    Google Scholar 

  27. Reimers, N., Gurevych, I.: Sentence-BERT: sentence embeddings using Siamese BERT-networks. arXiv preprint arXiv:1908.10084 (2019)

  28. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

  29. Sottana, A., Liang, B., Zou, K., Yuan, Z.: Evaluation metrics in the era of GPT-4: reliably evaluating large language models on sequence to sequence tasks. In: Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP) (2023)

    Google Scholar 

  30. Susnea, I., Pecheanu, E., Costache, S.: Challenges of an e-learning platform for teaching creativity. In: Proceedings of the 11th International Scientific Conference eLearning and Software for Education. Bucharest, Romania, April 2015

    Google Scholar 

  31. Torrance, E.P.: Torrance Tests of Creative Thinking - Norms Technical Manual Research Edition - Verbal Tests, Forms A and B - Figural Tests, Forms A and B. Personnel Press, Princeton, NJ (1966)

    Google Scholar 

  32. Zhuo, T.Y.: Large language models are state-of-the-art evaluators of code generation (2023)

    Google Scholar 

Download references

Acknowledgement

We would like to thank all participants who took part in the AUT and all raters who annotated the responses. LS acknowledges financial support from Invesco through their philanthropic donation to Cambridge Judge Business School.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zheng Yuan .

Editor information

Editors and Affiliations

Appendices

A A The Instructions Used for the AUT

General instruction: For the next four questions, there will be a time limit. For each task, please read the instructions and enter each possible answer separately by pressing the enter key after each one. If you run out of answers you may move on by pressing the next button, otherwise your question will automatically change after the allocated time.

Each task requires you to come up with as many different answers as possible. Try to be creative as there is no right or wrong answer.

Prompt 1: List as many different uses of a bowl as you can think of.

Prompt 2: Think of many different uses of a paperclip.

B B Detailed CFA Results

Table 6. Latent correlations between human creativity ratings and semantic distance factors (Model\(_{\textbf {non-contextual}}\) and Model\(_{\textbf {contextual}}\)) on the Cambridge AUT dataset.
Fig. 1.
figure 1

CFA diagram of Model\(_{\textbf {non-contextual}}\) on the Cambridge AUT dataset. r1-3: rater1-3; glv: GloVe; w2v: Word2vec; fst: fastText; HCR: human creativity rating factor, NSD: non-contextual semantic distance factor.

Fig. 2.
figure 2

CFA diagram of Model\(_{\textbf {contextual}}\) on the Cambridge AUT datseta. r1-3: rater1-3; uni: Universal Sentence Encoder; sen: Sentence-Transformers; rbt: RoBERTa; gpt: GPT-3; HCR: human creativity rating factor, CSD: contextual semantic distance factor.

C C Cross Validation Results

Table 7. Fine-tuned BERT cross validation results on the Cambridge AUT training sets. P: precision; R: recall.
Table 8. Fine-tuned RoBERTa cross validation results on the Cambridge AUT training sets. P: precision; R: recall.
Table 9. Fine-tuned GPT-3 babbage cross validation results on the Cambridge AUT training sets. P: precision; R: recall.
Table 10. Prediction performance on the dataset from [6]. P: precision; R: recall.

D D Model Performance on the Dataset from [6]

E E ChatGPT Prompt

You are a judge in the alternate uses task, where respondents are asked to list different uses for a common object. You will be presented with the object and a response that illustrates one of its uses. Please judge if the response is creative or non-creative. Inappropriate, invalid, irrelevant responses, and responses with common uses are considered non-creative, whereas appropriate, valid, novel and unusual uses are considered creative.

The object is: {prompt}

The response is: {response}

Please give your answer in “creative” or “non-creative”.

Your answer:

F F ChatGPT Classification Results

Table 11. ChatGPT results on the Cambridge AUT dataset. P: precision; R: recall.
Table 12. ChatGPT results on the dataset from [6]. P: precision; R: recall.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, L., Gu, H., Myers, R., Yuan, Z. (2024). A New Dataset and Method for Creativity Assessment Using the Alternate Uses Task. In: Cruz, C., Zhang, Y., Gao, W. (eds) Intelligent Computers, Algorithms, and Applications. IC 2023. Communications in Computer and Information Science, vol 2036. Springer, Singapore. https://doi.org/10.1007/978-981-97-0065-3_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0065-3_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0064-6

  • Online ISBN: 978-981-97-0065-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics