Skip to main content
Log in

Few-shot is enough: exploring ChatGPT prompt engineering method for automatic question generation in english education

  • Published:
Education and Information Technologies Aims and scope Submit manuscript

Abstract

Through design and development research (DDR), we aimed to create a validated automatic question generation (AQG) system using large language models (LLMs) like ChatGPT, enhanced by prompting engineering techniques. While AQG has become increasingly integral to online learning for its efficiency in generating questions, issues such as inconsistent question quality and the absence of transparent and validated evaluation methods persist. Our research focused on creating a prompt engineering protocol tailored for AQG. This protocol underwent several iterations of refinement and validation to improve its performance. By gathering validation scores and qualitative feedback on the produced questions and the system’s framework, we examined the effectiveness of the system. The study findings indicate that our combined use of LLMs and prompt engineering in AQG produces questions with statistically significant validity. Our research further illuminates academic and design considerations for AQG design in English education: (a) certain question types might not be optimal for generation via ChatGPT, (b) ChatGPT sheds light on the potential for collaborative AI-teacher efforts in question generation, especially within English education.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Data Availability

The availability of the data supporting the findings of this study is confirmed, while certain limitations have been imposed due to the inclusion of personal interviews and sensitive material. As a result, the data is not able to be made publicly available. However, the authors will evaluate data access requests individually, taking into account the reasonableness of the request and if proper rights have been obtained.

Notes

  1. This link is connected to the AQG manual in this research. https://docs.google.com/document/d/1h23DtAVeKHd1AiUvTlpVN3AG073xRg4vygpu4-o82-s/edit?usp=sharing

  2. This link connected to the AQG manual in this research. https://docs.google.com/document/d/1h23DtAVeKHd1AiUvTlpVN3AG073xRg4vygpu4-o82-s/edit?usp=sharing

  3. This link is connected to the AQG Example Output in this research.

    https://docs.google.com/document/d/1h23DtAVeKHd1AiUvTlpVN3AG073xRg4vygpu4-o82-s/edit?usp=sharing

References

Download references

Acknowledgements

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2020-0-00368, A Neural-Symbolic Model for Knowledge Acquisition and Inference Techniques).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Unggi Lee or Hyeoncheol Kim.

Ethics declarations

Conflict of Interest

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix 1

Appendix 1

An appendix contains supplementary information that is not an essential part of the text itself but which may be helpful in providing a more comprehensive understanding of the research problem or it is information that is too cumbersome to be included in the body of the paper. Therefore, we add the link in the footnote, which can show the full manual.Footnote 2

1.1 Introduction to ChatGPT

ChatGPT is a powerful tool developed by OpenAI that can understand and generate human-like responses in conversations. It has the ability to help generate questions for different purposes, including educational assessments.ChatGPT works by learning from a large amount of text data, which helps it understand grammar, vocabulary, and the meaning of words in different contexts. This learning process of ChatGPT is analogous to training ChatGPT's brain to understand language.

Once ChatGPT has learned from the training data, it can be fine-tuned for specific tasks like generating questions. During the fine-tuning process, it learns to generate questions that are relevant and appropriate based on a given passage or topic.

To use ChatGPT for question generation, you provide it with a passage of text, and it uses its knowledge and understanding of language to create questions that test the reader's comprehension of the material. It tries to generate questions that make sense based on the information in the passage.

However, it's important to know that ChatGPT has some limitations. Sometimes it may generate incorrect or nonsensical responses, and it can have difficulty understanding complex or ambiguous queries. Also, it relies on patterns it has learned from the training data and may generate responses that sound plausible but are actually incorrect.

Despite these limitations, ChatGPT has great potential as a tool for generating questions. By carefully using and validating its responses, it can assist in creating contextually appropriate and varied questions, making assessments more effective and efficient.

In summary, ChatGPT is an exciting tool that can understand and generate human-like responses. It can be used to generate questions that test understanding, but it's important to be aware of its limitations and use it wisely in order to benefit from its capabilities in question generation.

1.2 Example passage

The most plausible explanation for the science communication problem is the public’s limited capacity to comprehend science. The public is only modestly science literate. About half, we are regularly reminded, understand that the earth orbits the sun in a year as opposed to a day; less than a quarter knows that nitrogen is the most common gas in the earth’s atmosphere; less than 10\% can make sense of a two-by-two contingency table essential to determining the ratio of true to false positives when assessing medical test results. So how can members of the public possibly be expected to understand what scientists are saying when scientists try to explain complex issues like climate change or nuclear power? More importantly, members of the public do not think the way scientists do. They rely on rapid, intuitive, affect-driven sources of information processing to the exclusion of the deliberate, conscious, analytic ones essential to making appropriate judgments of risk. As a result, they tend to overestimate the magnitude of more emotionally charged disasters (e.g., terrorist attacks) and discount more consequential but more temporally or emotionally remote ones (e.g., the impact of human-caused climate change). They also are more likely to rely on defective heuristics, such as crediting the opinions of their peers, a form of reasoning that can trigger self-reinforcing states of polarization.

1.3 Example output

We also add the link in the footnote, which can show full example outputs.Footnote 3

1.3.1 Example 1

A1

Identify / y-n, alternative, t-f / multiple-choice

Prompt

Passage / make a yes–no question of identifying information that is explicitly shown in the text. After questions, give ‘yes or no’ choice option

Question

Is the public’s reliance on rapid, intuitive, affect-driven sources of information processing a result of their limited capacity to comprehend science? Yes or no: No

1.3.2 Example 2

A1

Identify / y-n, alternative, t-f / multiple-choice

Prompt

Passage / Make a multiple choice, ‘wh’ question asking you to choose what is the topic of the passage

Question

What is the main topic discussed in the passage?

A) The public’s limited capacity to comprehend science

B) The Earth’s orbit around the Sun

C) Nitrogen in the atmosphere

D) Two-by-two contingency tables for medical testing

E) Heuristics and polarization in the public’s thinking about science-related issues

1.3.3 Example 3

E4

Inference / cloze / multiple-choice

Prompt

Passage / Make a cloze test with multiple-choice at the sentence that contains information that can be implicitly inferred from the text. Rewrite the whole text with the cloze. Put the options at the end of the text

Question

Please fill in the blank with the correct answer:

The most plausible explanation for the science communication problem is the public’s limited capacity to comprehend science. The public is only modestly science literate. About half, we are regularly reminded, understands that the earth orbits the sun in a year as opposed to a day; less than a quarter knows that nitrogen is the most common gas in the earth’s atmosphere; less than 10 percent can make sense of a two-by-two contingency table essential to determining the ratio of true to false positives when assessing medical test results. So how can members of the public possibly be expected to understand what scientists are saying when scientists try to explain complex issues like climate change or nuclear power? More importantly, members of the public do not think the way scientists do. () sources of information processing to the exclusion of the deliberate, conscious, analytic ones essential to making appropriate judgments of risk. As a result, they tend to overestimate the magnitude of more emotionally charged disasters (e.g., terrorist attacks) and discount more consequential but more temporally or emotionally remote ones (e.g., the impact of human-caused climate change). They also are more likely to rely on defective heuristics, such as crediting the opinions of their peers, a form of reasoning that can trigger self-reinforcing states of polarization

A) They rely on rapid, intuitive, affect-driven

B) They rely on slow, logical, affect-neutral

C) They rely on a combination of rapid, intuitive, affect-driven and deliberate, conscious, analytic

D) They rely solely on deliberate, conscious, analytic

1.3.4 Example 4

D5

Sentiment / wh-q / open-ended

Prompt

Passage / Make an open-ended question that asks the main characters' sentiments in the P1, using 'wh' question form

Question

What were Claire and Tony's feelings towards the situation with the oil-covered bird on the beach and the boys trying to feed it?

1.3.5 Example 5

C8

Irrelevant Sentence / wh-q / multiple-choice

Prompt

Passage /Put an irrelevant sentence somewhere in the text. Rewrite the text including the irrelevant sentence. Make a test question that requires me to choose an irrelevant sentence from the text

Question

The most plausible explanation for the science communication problem is the public’s limited capacity to comprehend science. The public is only modestly science literate. About half, we are regularly reminded, understands that the earth orbits the sun in a year as opposed to a day; less than a quarter knows that nitrogen is the most common gas in the earth’s atmosphere; less than 10 percent can make sense of a two-by-two contingency table essential to determining the ratio of true to false positives when assessing medical test results. The moon is made of cheese. So how can members of the public possibly be expected to understand what scientists are saying when scientists try to explain complex issues like climate change or nuclear power? More importantly, members of the public do not think the way scientists do. They rely on rapid, intuitive, affect-driven sources of information processing to the exclusion of the deliberate, conscious, analytic ones essential to making appropriate judgments of risk. As a result, they tend to overestimate the magnitude of more emotionally charged disasters (e.g., terrorist attacks) and discount more consequential but more temporally or emotionally remote ones (e.g., the impact of human-caused climate change). They also are more likely to rely on defective heuristics, such as crediting the opinions of their peers, a form of reasoning that can trigger self-reinforcing states of polarization

Test question:

Which sentence in the passage is irrelevant to the topic of science communication?

A) "About half, we are regularly reminded, understands that the earth orbits the sun in a year as opposed to a day"

B) "The moon is made of cheese"

C) "More important still, members of the public do not think the way scientists do."

D) "They also are more likely to rely on defective heuristics, such as crediting the opinions of their peers"

1.3.6 The teacher validation results of each prompt

 

Mean

Variation

CVI

IRA

(each item)

IRA

A1

3.50

0.29

1.00

1

0.76

B1

3.38

0.55

0.88

0.88

C1

3.88

0.13

1.00

1

D1

3.38

0.27

1.00

1

E1

3.63

0.27

1.00

1

F1

3.25

0.50

0.88

0.88

A2

2.88

0.41

0.75

0.75

B2

3.25

0.21

1.00

1

C2

3.50

0.29

1.00

1

D2

3.25

0.21

1.00

1

E2

3.38

0.27

1.00

1

F2

3.00

0.57

0.75

0.75

A3

3.13

0.70

0.75

0.75

B3

3.25

0.79

0.75

0.75

C3

3.50

0.29

1.00

1

D3

3.13

0.41

0.88

0.88

A4

2.75

0.79

0.50

0.5

B4

3.25

0.50

0.88

0.88

C4

3.75

0.21

1.00

1

D4

3.38

0.55

0.88

0.88

E4

3.38

0.55

0.88

0.88

F4

3.00

1.43

0.63

0.63

A5

3.25

1.07

0.88

0.88

B5

3.63

0.27

1.00

1

C5

3.63

0.27

1.00

1

D5

3.63

0.27

1.00

1

E5

3.88

0.13

1.00

1

F5

3.25

0.50

0.88

0.88

A6

3.25

0.50

0.88

0.88

B6

3.38

0.55

0.88

0.88

C6

3.50

0.29

1.00

1

D6

3.50

0.29

1.00

1

E6

3.50

0.29

1.00

1

F6

3.13

1.27

0.75

0.75

C8

3.50

0.57

0.88

0.88

C9

3.00

1.14

0.75

0.75

C10

3.38

0.55

0.88

0.88

Average

3.35

 

0.89

  

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, U., Jung, H., Jeon, Y. et al. Few-shot is enough: exploring ChatGPT prompt engineering method for automatic question generation in english education. Educ Inf Technol (2023). https://doi.org/10.1007/s10639-023-12249-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10639-023-12249-8

Keywords

Navigation