Vietnam Journal of Computer Science

, Volume 1, Issue 2, pp 117–127 | Cite as

Automatic question generation for supporting argumentation

  • Nguyen-Thinh Le
  • Nhu-Phuong Nguyen
  • Kazuhisa Seta
  • Niels Pinkwart
Open Access
Regular Paper


Given a discussion topic, students may sometimes not proceed with their argumentation. Can questions which are semantically related to a given discussion topic help students develop further arguments? In this paper, we introduce a technical approach to generating questions upon the request of students during the process of collaborative argumentation. The contribution of this paper lies in combining different NLP technologies and exploiting semantic information to support users develop their arguments in a discussion session via tailored questions of different types.


Question generation Argumentation WordNet 

1 Introduction

In a constructivist learning environment, students are usually asked to solve an authentic problem. To solve the problem, students need to find a solution by researching, experimenting, and posing as well as testing hypotheses. Jonassen ([1], p. 226) proposed that a constructivist learning environment needs to provide some cognitive tools (also referred to as knowledge construction tools) that are used to “visualize (represent), organize, automate, or supplant thinking skills”. One example class of cognitive tools are question generation systems which pose questions related to the problem being solved. The question generation support can potentially be helpful for students during the process of gathering information and building hypotheses: if a student is not able to come up with any idea to investigate the problem to be solved, the learning environment could generate semantics-related questions for the student. Our hypothesis is that automatic question generation may be useful in a constructivist learning environment to support students in solving problems.

LASAD is a constructivist learning environment in which students discuss about a given topic [2]. This web-based collaborative argumentation system provides tools for supporting collaborative argumentation. That is, given a discussion topic, students are required to develop their arguments by creating a diagram. Figure 1 illustrates an argumentation map created by several users collaboratively using the LASAD system. In the system, participants can use typed argument boxes (e.g., claim, fact, explanation) and typed links to represent relationship between the arguments (e.g., support, oppose).
Fig. 1

LASAD: a computer-supported collaborative argumentation system

In order to generate questions related to a discussion topic automatically, two question areas need to be addressed:
  1. 1.
    What are the important concepts of discussion topic? How can a question generation system recognize and extract them? Where can the question generation system retrieve information related to the important concepts extracted from discussion topic? The first contribution of this paper is the proposition of an approach to answer these questions. The approach consists of the following processes:
    • Analyzing grammatical structure of natural language

    • Extracting main concepts from documents

    • Searching related concepts in a semantic network

  2. 2.
    How can a question generation system use the extracted information to generate questions which have the intention of helping participants of an argumentation session expand their argumentation? The second contribution of this paper is to propose a question generation approach which makes use of semantic information available on WordNet and consists of the following steps:
    • Generating questions using question templates

    • Generating questions using a syntax-based question generation system

In order to illustrate the semantics-based question generation approach in this paper, we will use the computer-supported collaborative argumentation system LASAD and the natural language English as a study case.

2 Question generation—state of the art

In order to generate questions related to a discussion topic, the topic statement is taken as the basis information. The topic statement is supposed to consist of one or several (grammatically correct) sentences which can serve question generation. Existing question generation approaches can be classified into three classes: syntax-based1, template-based, and semantics-based approaches.

Syntax-based question generation systems work through three steps: (1) delete the identified target concept, (2) place a determined question key word on the first position of the question, and (3) convert the verb into a grammatically correct form considering auxiliary and model verbs. For example, the question generation system of Varga and Le [3] uses a set of transformation rules for question formation. For subject–verb–object clauses whose subject has been identified as a target concept, a “Which Verb Object” template is selected and matched against the clause. The question word “Which” then replaces the target concept in the selected clause. For key concepts that are in the object position of a subject–verb–object, the verb phrase is adjusted (i.e., an auxiliary verb is used). Varga and Le reported that generated questions achieved a score of 2.45 (2.85) with respect to relevance and a score of 2.85 (3.1) with respect to syntactic correctness and fluency given a scale between from 1 to 4, with 1 being the best score. Values outside and inside in the brackets indicate ratings of the 1st and 2nd human rater.

The second approach, which is also employed widely in several question generation systems, is template-based [4]. The template-based approach relies on the idea that a question template can capture a class of questions, which are context specific. For example, Chen and colleagues [4] developed the following templates: “What would happen if \(\langle \)X\(\rangle \)?” for conditional text, “When would \(\langle \)X\(\rangle \)?” and “what happens \(\langle \)temporal-expression\(\rangle \)?” for temporal context, and “Why \(\langle \)auxiliary-verb\(\rangle \langle \)X\(\rangle \)?” for linguistic modality, where the place-holder \(\langle \)X\(\rangle \) is mapped to semantic roles annotated by a semantic role labeler. These question templates can only be used for these specific entity relationships. For other kinds of entity relationships, new templates must be defined. Hence, the template-based question generation approach is mostly suitable for applications with a special purpose.

In addition to questions that can be generated using phrases in a statement, semantic information related to the issue in a statement can also be exploited to generate semantics-based questions. For example, Jouault and Seta [5, 6] proposed to query information to facilitate learners’ self-directed learning using Wikipedia. Using this system, students in self-directed learning are asked to build a timeline of events of a history period with causal relationships between these events given an initial document (that can be considered a problem statement). The student develops a concept map containing a chronology by selecting concepts and relationships between concepts from the given initial Wikipedia document to deepen his understanding. While the student creates a concept map, the system also integrates the concept to its map and generates its own concept map by referring to semantic information, i.e., DBpedia [7] and Freebase [8]. The authors used ontological engineering and linked open data (LOD) techniques [9] to generate semantics-based adaptive questions and to recommend documents according to Wikipedia to help students create concept maps for the domain of history. The system’s concept map is updated with every modification of the student’s one. In addition, the system extracts semantic information from DBpedia and Freebase to select and add related concepts into the existing map. Thus, the system’s concept map always contains more concepts than the student’s map. Using these related concepts and their relations, the system generates questions for the student to lead to a deeper understanding without forcing to follow a fixed path of learning. One of the great advantages of adopting semantic information rather than natural language resources expected is that the system can give adequate advice based on the machine-understandable domain models without worrying about ambiguity of natural language.

From a technical point of view, automatic question generation can be achieved using a variety of natural language processing techniques which have gained wide acceptance. However, successful deployment of question generation in educational systems is rarely found in literature. Using the semantic information available on the Internet to generate questions to support learning is a relative new research area which is the subject for investigation recently. How can semantic information available on the Internet be processed to generate questions to support students learning in a constructivist environment? The question generation approach proposed in this paper employs all three existing approaches (syntax-based, template-based, and semantics-based) and uses semantic information provided on WordNet to generate questions. The question generation system (QGS) which applies this approach is described in the following section.

3 Question generation

The purpose of generating questions in the context of this paper is to give students ideas related to a discussion topic and guiding them how to expand the topic and continue their argumentation. As an input, the question generation system takes an English text from the discussion topic, which is provided by participants of an argumentation system. The text of a discussion topic can be an individual word, a list of words, a phrase, a sentence/question, or a paragraph. Therefore, recognizing, understanding the content of discussion topic clearly, and taking all types of text that are listed in Table 1 as input is the first step of the QGS system.
Table 1

Types and examples of input text



Individual word

Energy/ noun/ go

List of words

Energy, activation energy, and heat energy


Problem of meltdowns


Should we stop nuclear energy?


Nuclear energy is dangerous and should not be used. As we know that nuclear energy or nuclear power is a somewhat dangerous, potentially problematic. There are too many problems about this kind of energy, such as radiation, meltdowns, and waste disposal

Semantics-based question generation approaches use a source of semantic information which is related to the topic being discussed. Since in this paper we focus on using information available on the Internet for generating questions, the source of “semantic information” we look for is on the semantic web. For example, Wikipedia provides definitions of words and descriptions of concepts. While Wikipedia might contain incorrect information due to its contribution mechanism, one of the advantages of Wikipedia is that the definition of many concepts is available in many different languages. If we want to develop a question generation for different languages, Wikipedia might thus be an appropriate source. Beside Wikipedia, WordNet also provides a source of semantic information which can be related to a discussion topic. WordNet [10] is an online lexical reference system for English. Each noun, verb, or adjective represents a lexical concept and has a relation link to hyponyms which represent related concepts. In addition, for most vocabulary WordNet provides example sentences which can be used for generating questions. For example, if we input the word “energy” into WordNet, an example sentence like “energy can take a wide variety of forms” for this word is available. If we look for some hyponyms for this word, there are a list of direct hyponyms and a list of full hyponyms. The list of direct hyponyms provides concepts which are directly related to the word being searched, for example, for “energy”, we can find the following direct hyponyms on WordNet: “activation energy”, “alternative energy”, “atomic energy”, “binding energy”, “chemical energy”, and more. The list of full hyponyms contains a hierarchy of hyponyms which represent direct and indirect related concepts of the word being searched. One of the advantages of WordNet is that it provides accurate information (e.g., hyponyms) and grammatically correct example sentences. For this reason, we use WordNet to generate questions which are relevant and related to a discussion topic.

Concerning the types of questions to be generated, Graesser and Person [11] proposed 16 question categories: verification, disjunctive, concept completion, example, feature specification, quantification, definition, comparison, interpretation, causal antecedent, causal consequence, goal orientation, instrumental/procedural, enablement, expectation, and judgmental. The first 4 categories were classified as simple/shallow, 5–8 as intermediate, and 9–16 as complex/deep questions. This question taxonomy can be used to define appropriate question templates for generating questions.

Questions can also be generated using just main concepts available in a discussion topic (Sect. 3.5). The question generation approach proposed in this paper will use hyponyms (Sect. 3.6) and example sentences (Sect. 3.7) provided on WordNet for generating semantics-based questions. The question generation approach consists of the following steps.

3.1 Analyzing grammatical structure of natural language

In order to recognize and understand the content of a discussion topic, a natural language parser is used to analyze the grammatical structure of a sentence or a string of words into its constituents, resulting in a parse tree showing their syntactic relation to each other. This parser groups words together (as “phrase”) and determines the roles of words in each sentence, for instance, subject, verb, or object.

In order to extract important concepts from a text, a noun or a noun phrase can play a great role. For example, the important word from example (Ex. 3.1) is “nuclear energy”; the important words in example (Ex. 3.2) are “charity” and “energy”; and in example (Ex. 3.3), the main words are “heat energy” and “type of energy”. All of them are nouns or noun phrases.
  • Ex. 3.1: Should we stop nuclear energy?

  • Ex. 3.2: We will discuss charity and energy

  • Ex. 3.3: Heat energy is one of a type of energy

  • Ex. 3.4: Parents

  • Ex. 3.5: Go

Although there is only one word “parents” in example (Ex. 3.4), it is a noun and thus, can be the subject of an argumentation. Additionally, it is possible to generate meaningful questions about “parents”. In contrast, questions for the word “go”, a verb in example (Ex. 3.5), can be almost meaningless. Therefore, a given text first must be analyzed and parsed with a natural language parser to determine which words should be considered important concepts. If there are errors or problems with this analyzing and parsing step, the correctness of generated questions by QGS can be affected.

3.2 Extracting main concepts from documents

After analyzing and parsing the discussion topic with a natural language parser, QGS extracts all important concepts, which are determined as nouns and noun phrases in a discussion topic (Sect. 3.1). In order to retrieve more information, every extracted noun or noun phrase is used as resource to search for its related concepts (hyponyms and example sentences of each hyponym) in the WordNet [12] database.

The concepts retrieved from the WordNet database play important roles for the question generation steps in QGS. Hyponyms give participants of an argumentation session more information related to the extracted nouns and noun phrases. Example sentences for each hyponym add information to that hyponym and might help the participant of an argumentation session understand the use of that hyponym. Therefore, not only the nouns or noun phrases extracted from a discussion topic can be subjects for generating questions, but also hyponyms and example sentences provided on WordNet.

Some noun phrases are less important than individual nouns. For example, the noun phrase in example (Ex. 3.6) cannot be found in the WordNet database. QGS, therefore, only needs to extract the word “sea” as resource for its next steps.
  • Ex. 3.6: Deep blue sea (Type: Adjective \(+\) Adjective \(+\) Noun)

  • Ex. 3.7: Nuclear energy (Type: Noun \(+\) Noun)

  • Ex. 3.8: Energy of activation (Type: Noun \(+\) “of” \(+\) Noun)

However, some noun phrases like example (Ex. 3.7) and (Ex. 3.8) are common nouns and exist in the WordNet database, together with a semantic network and their hyponyms and example sentences. Therefore in this case, “Nuclear energy” is much more important than “nuclear” or “energy” and “Energy of activation” is much more important than “energy” or “activation”. However, at least for brainstorming purposes, the more information a question generation system can extract and provide to its users, the more chances there are for stimulating good ideas. That is why after all, not only individual nouns but also noun phrases, whose types are listed in Table 2, are extracted, as these noun phrases might be found in WordNet database. Thus, the result of the concept extraction, for example (Ex. 3.6), is “sea”. For example (Ex. 3.7), “nuclear”, “energy”, and “nuclear energy” are results of the concept extraction. Results, for example (Ex. 3.8), are “energy”, “activation”, and “energy of activation”. Table 2 also contains the noun phrase type “Adjective \(+\) Noun”, because example (Ex. 3.1) “Should we stop nuclear energy?”, as parsed by the Stanford Parser,2 results in the noun phrase “nuclear energy” of type “Adjective \(+\) Noun”.
Table 2

Types of extracted noun phrases


Noun \(+\) Noun

Noun \(+\) “of” \(+\) Noun

Adjective \(+\) Noun

3.3 Searching related concepts in semantic network

As mentioned in Sect. 3.2., WordNet is used as a source of lexical knowledge for searching all concepts related to every noun or noun phrase extracted from a discussion topic. Thus, WordNet provides the QGS more information about the extracted words. However, WordNet does not contain nouns in plural form. For example, consider searching concepts related to “fish” and “children” in example (Ex. 3.9) in WordNet database; the result is unexpected.
  • (Ex. 3.9) All fish are good for children.

Even though “fish” and “children” are two very common and simple words, WordNet can only recognize “fish”, as its singular and plural form are the same. WordNet in this case considers “fish” as a noun in singular form and is able to extract information related to “fish” from its database. However, “children” is totally different from its singular form “child”. The word “children” does not exist in the WordNet database, nor there exists any connection between “children” and “child” in the database. Thus, WordNet is not able to recognize and cannot provide any concept related to the query “children”.

In order to solve the problem caused by WordNet database, a Plural-to-Singulardouble-search method is introduced. First, QGS searches for the concepts related to every extracted noun or noun phrase in the WordNet database normally. If WordNet returns nothing for any extracted noun or noun phrase, this noun or noun phrase is then considered as the word/phrase in plural form. QGS, therefore, tries to turn this plural form into singular form using the stripping common English endings of word method. For example, QGS removes the ending “-en” of “children” and returns “child” as new word. After that, QGS starts searching the concepts related to this new word in the WordNet database one more time (second search).

3.4 Question generation

In order to generate questions, the approach described in this paper proposes using question templates in Table 3, whereas X is the noun or noun phrase extracted from a discussion topic, or each hyponym extracted from WordNet. The question templates are defined according to the question classification proposed in [12].
Table 3

Question templates proposed for QGS




What is \(\langle \)X\(\rangle \)?

What do you have in mind when you think about \(\langle \)X\(\rangle \)?

What does \(\langle \)X\(\rangle \) remind you of?


What are the properties of \(\langle \)X\(\rangle \)?

What are the (opposite)-problems of \(\langle \)X\(\rangle \)?

What features does \(\langle \)X\(\rangle \) have?


What is an example of \(\langle \)X\(\rangle \)?


Is there any problem with the arguments about \(\langle \)X\(\rangle \)?


What do you like when you think of or hear about \(\langle \)X\(\rangle \)?


How can \(\langle \)X\(\rangle \) be used today?


How will \(\langle \)X\(\rangle \) look or be in the future, based on the way it is now?


How many sub-topics did your partners talk about?

Which sub-topics do your partners focus on?

Concept Comparison

What is the difference or relations between these sub-topics?

3.5 Question generation without using WordNet

Using question templates defined in Table 3, we are able to replace the placeholders by nouns and noun phrases extracted from a discussion topic. For example, the following question templates are filled with the noun phrase “nuclear energy” and result in some questions shown in Fig. 2.
  • What does \(\langle \)X\(\rangle \) remind you of?

  • What are the properties of \(\langle \)X\(\rangle \)?

  • What is an example of \(\langle \)X\(\rangle \)?

Fig. 2

Questions have been generated without using WordNet

3.6 Question generation using hyponyms

In addition to generating questions without using WordNet, placeholders in question templates can also be filled with appropriate hyponym values for generating questions. For example, the following question templates can be used to generate questions of the question class “Definition”. If the noun “energy” exists in a problem statement, and after inputting this noun into WordNet, we will get several hyponyms, including “activation energy”. For example, using the question templates, we are able to generate three possible questions of the class Definition (see Table 4).
Table 4

An example of question template for the question class “Definition”


Question template



What is \(\langle \)X\(\rangle \)?

What is activationenergy?

What do you have in mind when you think about \(\langle \)X\(\rangle \)?

What do you have in mind when you think about activation energy?

What does \(\langle \)X\(\rangle \) remind you of?

What does activation energy remind you of?

Exploiting hyponyms to generate questions, we propose to generate a main question and several supporting questions which help students to think deeper about an issue. The supporting questions can be generated using appropriate question templates. For example, we define Template 1 for the class of “Feature specification” questions. Supporting questions for this question class are instantiated using question templates 1.1, 1.2, and 1.3 (Table 5). Questions generated using these templates are instances of the question class “Expectation”.
Table 5

Templates for supporting questions


Question template

Feature specification

Template 1: What are the (opposite)-problems of \(\langle \)X\(\rangle \)?


Template 1.1: What would you do if they were twice as big (or half as big)?


Template 1.2: How would you think about or deal with them if you were in different time period?


Template 1.3: How could (opposite)-problems of \(\langle \)X\(\rangle \) be stopped?

Figure 3 illustrates questions which have been generated using hyponyms on WordNet. At first, a list of hyponyms which are related to the noun “nuclear energy” is shown, followed by a list of generated questions. The supporting questions are indented (e.g., “What would you do if they were twice as big (or half as big)?”).
Fig. 3

Questions have been generated using hyponyms on WordNet

3.7 Question generation using examples sentences

As discussed, in addition to hyponyms, WordNet also provides example sentences (for hyponyms) which are grammatically correct. We propose to make use of example sentences to generate questions. For example, for the sentence “Peter has 20 apples”, the following questions can be generated (Fig. 4):
Fig. 4

An example using the ARK Question Generation and the corresponding accuracy scores

There are existing successful question generation tools which are based on input texts. For example, ARK [13] is a syntax-based tool for generating questions from English sentences or phrases. The system operates on syntactic tree structures and defines transformation rules to generate questions. Heilman and Smith [13] reported that the system achieved 43.3 % acceptability for the top 10 ranked questions and produced an average of 6.8 acceptable questions per 250 words on Wikipedia texts. It also introduces a question-ranker system, which scores and ranks every generated question. This score-and-ranking system helps us to know the accuracy rate of questions, which are generated from a given sentence. For the input sentence “Peter has 20 apples”, ARK produces several questions with according accuracy rate and the generated question “How many apples does Peter have?” has the highest accuracy score (2.039) (Fig. 4).

The approach being proposed in this paper exploits the syntax-based question generation tool ARK for generating questions which are semantically related to a given discussion topic. Figure 5 illustrates several questions which have been generated using example sentences available on WordNet when given a discussion topic “Should we stop nuclear energy?” Using the syntax-based question generation tool ARK, we are able to add one more question type (Concept Completion) to QGS and strengthen questions of the type quantification and verification. With the use of question templates (cf. Sect. 3.4) and a syntax-based question generation tool, QGS can then generate ten question types (Definition, Feature, Example, Judgment, Interpretation, Expectation, Verification, Quantification, Concept comparison, Concept completion).
Fig. 5

Questions have been generated using example sentences on WordNet

4 Implementation

The question generation system which is connected with the argumentation system LASAD consists of the following components: the Stanford Parser, a Noun Extractor, a Data Storage, a pool of Question Templates, the ARK syntax-based question generation tool, and WordNet 2.1 as a source of lexical knowledge (Fig. 6).
Fig. 6

The architecture of the integration of a question generation system in LASAD

For parsing English phrases, currently Link Grammar Parser [14] and Stanford Parser [15] (a lexicalized Probabilistic Context-Free Grammar (PCFG)) are two of the best semantic parsers. Link Grammar Parser is a rule-based analyzer, which is essential to obtain accurate results. However, a statistical analyzer parser like Stanford Parser, which is written in Java, is more tolerant with both words and constructions, which are not grammatically correct. Even if there are grammatical errors (e.g., “Parents always does loves their childs”.), a parse tree still can be created by the Stanford Parser. For this reason, we used Stanford Parser to analyze grammatical structure of input sentences.

The noun extractor has been developed to extract main concepts from a discussion topic (cf. Sect. 3.2). It takes a complex text, which can be a word, a phrase, a sentence, or a paragraph as input and returns a list of extracted nouns and noun phrases (\(L_{\mathrm{result}})\) as output. The algorithm starts taking the best parse tree (which has highest parse score) returned by Stanford Parser. The parse tree will be used to obtain a list of nouns. The algorithm for extracting nouns is illustrated as pseudo-code in Fig. 7.
Fig. 7

Noun and noun phrase extractor

For the purpose of optimizing the time for searching and extracting nouns on WordNet, the Data Storage component works as a history tracer. It stores all the nouns and noun phrases extracted from the given text of users, along with their generated question-lists in an XML file. If the nouns extracted from the discussion topic statement exists in the Data Storage, the QGS only needs to extract the matching questions-list for each noun and noun phrase from Data Storage and returns these lists to users. In this case, the system does not have to generate questions for each noun phrase. Thus, the performance of the system is optimized.

In order to retrieve semantic information, we use the latest version of the WordNet database 2.1 for Windows. The ARK question generation tool has been described in Sect. 3.7. The process of question generation consists of five steps:
  • Step 1: parse input text and analyze grammatical structure using the Stanford Parser.

  • Step 2: extract nouns/noun phrases using the Noun Extractor\(.\)

  • Step 3: search for the extracted nouns and noun phrases in the Data Storage. If they exist, QGS extracts the matching question lists out of Data Storage and starts Step 5. If the extracted nouns and noun phrases are not stored in Data Storage, QGS starts Step 4.

  • Step 4: input extracted nouns and noun phrases into the WordNet database, QGS then extracts all matching hyponyms and example sentences.

  • Step 5: Questions are generated based on extracted hyponyms and example sentences provided on WordNet using the pool of Question Templates and the ARK component. Pairs of noun/noun phrase and generated questions are stored in the Data Storage. In addition to generated questions using WordNet, nouns and noun phrases extracted from the discussion topic are also used as input to generate questions.

5 Evaluation

In this section, we report on evaluation about the utility of the algorithm for extracting main concepts from a discussion topic, the quality of generated questions and the efficiency of the Plural-to-Singular transformation method.

5.1 Extracting main concepts

In order to examine if QGS recognizes and extracts the main concepts (the nouns and noun phrases) from the input text (discussion topic) correctly, the following paragraph was used as input:

“As we know that nuclear energy or nuclear power is somewhat dangerous, potentially problematic. There are too many problems about this kind of energy, such as: Radiation, Meltdowns, and Waste Disposal. Radiation is dangerous. Radiation of nuclear waste and maintenance materials is not easily dealt with. Moreover, expensive solutions are needed to contain, control, and shield both people and the environment from its harm”.

Using paragraph as input, QGS could recognize and extract 13 out of 14 expected results that are listed in Table 5. It could not extract the phrase “radiation of nuclear waste”, as the structure of this phrase was not declared for QGS. However, QGS extracted nine further extra nouns (from #15 to #23). Eight out of these nine nouns were acceptable, only the noun “many problems” (#17) was almost meaningless, compared to its original “problem” (#3). For brainstorming purposes, the use of extra nouns can actually be very helpful during the next steps of the question generation process. For example, with the extra noun “energy”, QGS gave users information about types of energy such as solar energy, wind energy, etc. The users, therefore, could develop their argument, for instance, “if nuclear energy is too dangerous, solar energy or wind energy may be the replacement solution”.

5.2 Generating questions

After checking the ability of extracting main concepts from a discussion topic, here, we examine whether QGS generated enough questions.

Eighteen question templates were used to generate questions for not only extracted main concept from a discussion topic, but also for any related concept extracted from WordNet. Out of eighteen question templates, four templates for the question types Quantification and Concept Comparison were only used for main concepts, for which the system could find at least two related hyponyms on WordNet. In addition, the number of generated questions also depended on the syntax-based question generation tool. For example, generating questions for “nuclear energy” in Table 6, QGS found one hyponym “atom energy” and one example sentence “nuclear energy regarded as a source of electricity for the power grid (for civilian use)” related to “nuclear energy” in WordNet database. Therefore, QGS generated
  • Fourteen questions for “nuclear energy” by using question templates, as there was only one hyponym

  • Fourteen questions for “atom energy” by using question templates (without four questions of type Quantification and Concept Comparison)

  • Four questions by using syntax-based question generation tool ARK (Table 6).

In summary, QGS generated enough question as expected. Note, some generated questions (e.g., the ones generated using the syntax-based question generation tool ARK) are not sound, although they seem to be grammatically correct. Thus, they need to be investigated with respect to their appropriateness (Table 7).
Table 6

List of main concepts extracted from an input paragraph


Expected result

Actual result


Nuclear energy

Nuclear energy


Nuclear power

Nuclear power





Kind of energy

Kind of energy








Waste disposal

Waste disposal


Radiation of nuclear waste



Nuclear waste

Nuclear waste


Maintenance materials

Maintenance materials


Expensive Solutions

Expensive Solutions


















Many problems



















Table 7

Questions generated by a syntax-based question generation component

Example sentence related to “nuclear energy” on WordNet

Questions generated by ARK

Nuclear energy regarded as a source of electricity for the power grid (for civilian use)

What did nuclear energy regard as for the power grid?

What regarded as a source of electricity for the power grid?

What did nuclear energy regard as a source of electricity for?

Did nuclear energy regard as a source of electricity for the power grid?

5.3 Efficiency of using plural-to-singular method

The WordNet database was used as the only semantic resource that provided all the important concepts for the question generation process. Since WordNet does not contain any noun in plural form, the plural-to-singular double search method has been introduced in Sect. 3.3. This method pretends that the source noun is in plural form and is implemented for the purpose of searching nouns in the WordNet database. If, after trying all of the cases that the source noun could transform into (e.g. “ladies” could be transformed into “ladie”, “ladi”, and “lady”), the system still did not find any related concept of any predicted singular form, it would consider that the source noun was actually not in plural form and kept the source noun.

In an examination with an irregular plural nouns list3 that contains 182 plural nouns of all types (special irregular plural nouns (e.g. children, people, men, etc.) and irregular plural nouns ending with -s, -x, -es, -ves, -ies, -ices, -a, -i, -im), the plural-to-singular method recognized and worked successfully with the singular forms of 173 out of the 182 irregular plural nouns. It failed to detect the combined plural nouns (“sons-in-law”, “runners-up”) and the plural nouns that ended with “aux” (“Beaux”, “Beraux”, “Chateaux”, “Plateaux”, “Tableaux”). In addition, the system was confused with some plural nouns, including “axes” because the system could detect only “axe” as its singular form. It could not detect “axis”, either, nor could it completely handle and “busses”, because this is plural form of both “bus” and “buss”. The system could detect only “buss”. In summary, the Plural-to-Singular double search method provided an efficient way to improve the usage of the WordNet database, as users were not forced to use only nouns in singular forms to receive result from WordNet Database and support from QGS.

6 Discussion, conclusion, and future work

In this paper we have proposed an approach to generating questions to help students during brainstorming activities in which they expand their arguments when participating in a discussion. The approach proposed in this paper combines three question generation approaches: syntax-based, template-based, and semantics-based. This approach generates not only questions based on the main concepts of a given discussion topic, but also questions based on semantic information available on WordNet.

The question generation method proposed in this paper may have drawbacks due to using question templates. That is, the question templates can be very specific for a special domain as discussed in Sect. 3. We were aware of this problem and tried to define question templates which should be general enough for several discussion domains. For example, the question templates in Table 3 can be used to generate questions for the discussion topic “charity” by replacing the placeholder: (1) What is charity? (2) What do you have in mind when you think about charity? (3) What does charity remind you of? These questions are appropriate for helping participants think about the topic when they have to discuss about charity. Whether all question templates are general enough for other discussion topics, this needs to be evaluated and is a part of our future work. Since the goal of our research is to support students during brainstorming argumentation activities, we also intend to conduct an empirical evaluation study for this purpose.


  1. 1.

    The syntax-based approach is also referred to as transformation-based in literature because transformation rules are defined and applied on syntax structures of input sentences.

  2. 2.
  3. 3.



We would like to thank Jouault Corentin (Osaka Prefecture University) for preparing literature for the state of the art described in this paper.


  1. 1.
    Jonassen, D. H.: Designing constructivist learning environments. In: Reigeluth, C. M. (eds.) Instructional Design Theories and Models: A New Paradigm of Instructional Theory, vol. 2, pp. 215–239. Lawrence Erlbaum, Hillsdale (1999)Google Scholar
  2. 2.
    Loll, F., Pinkwart, N., Scheuer, O., McLaren, B. M.: Simplifying the development of argumentation systems using a configurable platform. In: Pinkwart, N., McLaren, B.M. (eds.) Educational Technologies for Teaching Argumentation Skill. Bentham Science Publishers, Sharjah (2012)Google Scholar
  3. 3.
    Varga, A., Le, A. H.: A question generation aystem for the QGSTEC 2010 Task B. In: Proc. of the 3rd WS. on Question Generation, held at the ITS Conf., pp. 80–83 (2010)Google Scholar
  4. 4.
    Chen, W., Aist, G., Mostow, J.: Generating questions automatically from informational text. In: Proceedings of the 2nd Workshop on Question Generation, held at the Conference on AI in, Education, pp. 17–24, (2009)Google Scholar
  5. 5.
    Jouault, C., Seta, K.: Building a semantic open learning space with adaptive question generation support. In: Proceedings of the 21st International Conference on Computers in Education, pp. 41–50 (2013)Google Scholar
  6. 6.
    Jouault C., Seta, K.: Adaptive self-directed learning support by question generation in a semantic open learning space. Int. J. Knowl. Web Intell. (2014)Google Scholar
  7. 7.
    Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia-A crystallization point for the Web of Data. Web Sem Sci Serv Agents World Wide Web 7(3), 154–165 (2009)CrossRefGoogle Scholar
  8. 8.
    Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the SIGMOD International Conference on Management of Data, pp. 1247–1250, ACM (2008)Google Scholar
  9. 9.
    Heath, T., Bizer, C.: Linked Data: Evolving the Web into a Global Data Space. Morgan & Claypool Publishers, San Rafael (2011)Google Scholar
  10. 10.
    Miller, G.A.: WordNet: a lexical database. Commun ACM 38(11), 39–41 (1995)CrossRefGoogle Scholar
  11. 11.
    Graesser, A.C., Person, N.K.: Question asking during tutoring. Am Educ Res J 31(1), 104–137 (1994)CrossRefGoogle Scholar
  12. 12.
    Heilman, M.: Automatic Factual Question Generation from Text. Ph.D. Dissertation, Carnegie Mellon University. CMU-LTI-11-004 (2011)Google Scholar
  13. 13.
    Heilman, M., Smith, N.A.: Question Generation via Overgenerating Transformations and Ranking. Technical report, Language Technologies Institute, Carnegie Mellon University. CMU-LTI-09-013 (2009)Google Scholar
  14. 14.
    Sleator, D., Temperley, D.: Parsing English with a Link Grammar. In: Proceedings of the 3rd International Workshop on Parsing Technologies (1993)Google Scholar
  15. 15.
    Klein, D., Manning, C. D.: Accurate Unlexicalized Parsing. In: Proceedings of the 41st Meeting of the Association for, Computational Linguistics, pp. 423–430 (2003)Google Scholar

Copyright information

© The Author(s) 2014

This article is published under license to BioMed Central Ltd. Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution and reproduction in any medium, provided the original author(s) and source are credited.

Authors and Affiliations

  • Nguyen-Thinh Le
    • 1
  • Nhu-Phuong Nguyen
    • 2
  • Kazuhisa Seta
    • 3
  • Niels Pinkwart
    • 1
  1. 1.Humboldt Universität zu BerlinBerlinGermany
  2. 2.Clausthal University of TechnologyClausthal-ZellerfeldGermany
  3. 3.Osaka Prefecture UniversitySakaiJapan

Personalised recommendations