Keywords

1 Introduction

This article presents a first proof of concept of the prototype platform from the project “IDeRBlog”, which is an acronym of German Individuell Differenziert Richtig schreiben mit Blogs, which means translated literally to English “Individually differentiated correctly writing by using blogs”. The project combines Technology Enhanced Learning (TEL) with Learning Analytics (LA) in the context of German orthography and spelling acquisition [2].

The IDeRBlog system provides a platform for children aged between 8 and 12 years. On this platform they can write and submit essays about their daily business or specified topics, which are assigned or proposed by the teacher. Teachers then can review the texts on the platform, correct them, give feedback and hand it back to the students for further inspection. Before students hand in the text, the system offers the users the possibility to check their spelling with the help of our “intelligent dictionary”. In case of a beforehand categorized mistake the systems gives a feedback that encourages the user to think about the spelling by applying a strategy in order to correct the mistake. In contrast to a conventional auto correction system which only provides information that the word is (possibly) wrong and may suggest the correct word - or a list of possibly words - our systems helps to gain deeper insight in the system of German orthography. The correct spelling of a word - without a strategy based feedback - is presented for only very few words. These are words that cannot be explained systematically by the system of the German orthography and need to be memorized. This way of supporting children by giving them a specific feedback when correcting texts is supposed to lead to a deeper understanding of the German orthography and its complex system.

The prototype is web-based due to the increasing use of devices such as computers and laptops as well as mobile devices with internet connection [3]. Thereby it is possible to trace interactions [4] between students and the learning platform for later analysis. Another benefit of this approach is the attractiveness of writing with computers for children [5]. The provided blog further gives reasons for writing because the pupils can publish their essays later on [6]. Therefore, we expect a higher motivation in formulating and revising a text in contrast to typical essay writing in a classroom [7].

Through this platform we suppose to gain insights into a learners’ learning process [8] for early detection of learning issues. Teachers then can use this information to intervene [9, 10] and help pupils with the acquisition of German orthography. Therefore, the platform provides an area for teachers where they can correct and prepare the texts for publishing in the blog as well as write further feedback to the student.

1.1 Research Questions

In this article we will answer the following research questions to provide a proof of concept:

  • Does this first test proof our concept of the system?

  • How many spelling mistakes will be recognized by our intelligent dictionary?

  • Which are the mistakes the system cannot identify?

  • Which “teacher categories” are used most frequently?

1.2 Outline

The next section provides background about the German orthography, the intelligent dictionary and its system of categories. Further, the concept and the technology of the platform will be discussed. The third section will provide the results of the first test with texts provided by students. The last section discusses our findings, limitations and future work on this project.

2 Background

2.1 German Orthography

The German orthography is much more transparent than the English one, where “the alphabet contains just 26 letters [which] correspond to 44 phonemes associated with 102 functional spelling units” [11]. Nevertheless, it is not as transparent as, for example, the Turkish one. Therefore, a lot of words - but not all - cannot be spelled by relying on the phoneme-grapheme-correspondences since other orthographic principles are interfering. For example, the German word for hat <Hut> can be spelled correctly by relying on the phoneme-grapheme-correspondences, whereas the German word for dog <Hund> would be spelled incorrectly by purely relying on the correspondences. The reason for this is the so called phenomenon of terminal devoicing and the existence of the morphological principle. Because of these phenomena the word is pronounced as /hunt/which would lead to the incorrect spelling <*Hunt>. At the beginning of a syllable the obstruent is pronounced voiced, as in /hundə/. Consequently, the spelling of <Hund> is due to the morphological principle of the German orthography. Because of this principle morphemes and words are spelled the same way in all possible words (e.g. <Hund, Hunde, Hündin>, not <*Hunt, Hunde, Hündin>). Furthermore, the German orthography uses capitalization not only at the beginning of a sentence, but also within a sentence in order to mark substantives. Therefore, the reason for the correct spelling <Hund> in contrast to the incorrect spelling <*hund> lies in the syntactic principle.

The co-existence of these principles, which are described above in a very brief and superficial way (for a detailed description see e.g. [12]), often leads to the assumption that the German orthography is unsystematic and illogical. One possible consequence is, that children are confronted with an unsystematic way of instruction that focuses on learning by rote. Although the mastery of the orthography is rather important in German, since it is rather prestigious, students experience spelling instructions as boring and formal [13]. “In contrast to other areas of language learning, there is hardly space to argue about the correct or incorrect spelling of a word. This orthographical stiffness can probably serve as an explanation for its importance” [7].

2.2 Intelligent Dictionary

The main idea is to improve the orthographic competence of pupils by writing essays on a platform that provides a special feature - the intelligent dictionary - which gives a feedback in order to think about and consequently correct the misspelled word. It offers the correct spelling only in a few selected cases. Unlike a conventional auto correction system the intelligent dictionary in general does not offer the correctly spelled word straightaway in order to serve didactic purposes: First, students have to give attention to the feedback and have to process it by applying it on the misspelled word. Second, this approach is based on a wider definition of spelling competence that does not only include a person´s knowledge of the correct spelling of given words and the rules of orthography, but also being sensitive to misspelled words, knowing how to correct them, and applying strategies to prevent spelling errors in a long run [7, 14] Third, this system follows a modern approach of teaching and learning orthography, which considers the communicative aspect of writing (cf. e.g. [15]): The pupils work on their orthographic competence by writing essays which can also be published. Therefore, the motivation to correct the mistakes should be higher and it might be more attractive than doing conventional exercises that focus purely on orthography. Nevertheless, the platform offers online exercises and printable worksheets in order to work on a specific orthographic phenomenon. Although orthography is only one aspect among others of text writing skills, it is an important one to work on. This shows a big survey of various competences in German language including also reading and listening among others: The results indicate that 27% of the tested Austrian pupils in grade 4 did not reach the standards of the application of the correct orthography and punctuation in the task of producing texts (cf. [1]).

2.3 Categorization

In order to give a feedback for correcting the mistakes and in order to offer a qualitative analysis of the mistakes it is necessary to establish a complex system of categories. This system is developed on a linguistic and orthographic basis. Currently the systems covers 28 categories, separated into 143 phenomena and 58 feedbacks.

The reason for these unequal numbers lies in the different requirements: The categories are visible for the users in the qualitative analysis. Therefore, the number should be kept as small as possible, but as exact as necessary. These 28 categories are also labeled “teacher categories”, since especially the teachers will work with them. Due to the complexity of German orthography the possible mistakes must be divided in different phenomena in order to categorize the misspelled words in an exact way for constructing the intelligent dictionary. In the system each phenomenon is connected with a category. For the feedback it is possible to merge two or even more phenomena. This helps to keep the amount of different feedbacks as small as possible in order that the users get familiar with the different hints. Considering the requirements of a scientific analysis the fine-grained phenomena allow a deep analysis in order to gain a better understanding of the acquisition process in a long run. Consequently, it could be necessary to add new phenomena or delete existing ones, which is easier to manage due to the level of phenomena.

To gain a better understanding of this system, an example of a category with its phenomena and feedback is given: The category “prefix” consists of 12 phenomena. Due to the morphological principle of German orthography a prefix is always spelled the same way in all possible words and word forms with this certain prefix. For example, the prefix ver- is always spelled as <ver> like in verlaufen (to lose ones ways), verlieben (to fall in love), verreisen (to go on a journey) and the prefix ent- is always spelled as <ent-> in entdecken (to discover) or entfernen (to remove). Each of the 12 phenomena of this category describe one prefix with its possible mistakes (e.g. <*fer> instead of <ver> in <*ferreisen> or <*end> instead of <ent> in <*enddecken>). Since spelling errors of this kind are very similar, the same feedback can be given for all 12 phenomena. Therefore, the pupils get the (literally translated) feedback “Think about the spelling of the world building brick”. This should guide the writer’s attention towards the prefix and enable him/her to correct it.

The advantages of the linking of the different phenomena with one category are the following: First, this enables us to conduct analysis of each phenomenon separately in order to gain a better understanding of the use and frequency of spelling mistakes of each prefix. Second, we can add phenomena for prefixes that are not considered yet. Therefore, modifications concerning the phenomena can be undertaken without confusing the user.

Since the lexicon of a language is endless, it is not - and will never be - possible to consider all words of a language and all possible mistakes of a specific word. Therefore, the development of the intelligent dictionary is currently based on the words of the basic vocabulary of three federal states in Germany (for details see [7]). For these words all word forms are considered. This is challenging especially in the German language since it has quite a rich morphology. The number of word forms for one word varies from one word form (e.g. prepositions) up to 17 different word forms (e.g. adjectives).

Based on this word forms the possible mistakes are derived and assigned to a phenomenon. Therefore, one word form can be connected with different misspelled words in different phenomena.

2.4 Platform

With the IDeRBlog platform we try to combine the development of writing skills, acquisition of orthographic competence and improving the reading skills with modern means of communication and digital instruments [7].

Figure 1 shows the IDeRBlog system, which can be used after prior registration with a separate user management system. It is a web-based application with state of the art technology such as HTML5, responsive web design and web services for native Android or iOS applications (under development). The Application Server handles the communication from the students and the teachers and is implemented with the GRAILS web application framework for Java platforms. Grails is based on Groovy and uses different established frameworks such as Spring and Hibernate. To ensure a clean and manageable project the Model View Controller (MVC) Pattern is used.

Fig. 1.
figure 1

Architecture.

The submitted text by the student is first analyzed automatically regarding spelling mistakes. Here we use the conventional system of dividing the text into sentences and further into tokens. After the part-of-speech tagging [16] the tokens are assigned to categories. Based on that information our intelligent dictionary will provide age-appropriate feedback, according to the detected spelling mistake in connection with its phenomenon. As described above, the feedback is designed to encourage students to reflect and think about the made spelling mistakes and become aware of the structure of the words [7, 20]. Additionally, spelling mistakes which have not been categorized by the intelligent dictionary will be marked as spelling mistakes without a specific feedback. Further, based on the occurred errors and its corresponding categories, the platform can recommend exercises from the provided training database [17]. In order to understand, how this systems works in practice, Fig. 2 shows the feedback for two different mistakes.

Fig. 2.
figure 2

Text correction example

Figure 2 shows a feedback example with the text, which means in English, „Today we discovered many new things in the woods. The distance between our camp and the river was very far”. The student made two quite similar spelling mistakes: enddeckten (‘(we) discovered’) with <*end> instead of <ent> and Endfernung (‘(the) distance’) with <*End> instead of <Ent>, which are shown to him/her with the appropriate hint for correction: “Think about the spelling of the world building brick”. As described in the background section, the writer’s attention should be guided to the prefix and enable him/her to correct it. The headline serves as an instruction as it tells that pupils, that he/she can see his/her mistakes and that he/she gets hints for correcting them. The hints appear, when the pupils clicks on or hover over the highlighted word.

This intelligent dictionary is embedded in a platform that offers more features and has a specific workflow for pupils and teachers, as shown in Fig. 3 and described separately for pupils’ and teachers’ use.

Fig. 3.
figure 3

Workflow for students and teachers.

Workflow for Pupils

After login the pupils have several possibilities: They can start to write a new text in the writing area (1) or access the reports of their previously submitted texts and the evaluation carried out by the teacher; they can access the private/class/school-blog, where they can find published texts of other pupils, or they can work on recommended exercises in the training database.

In case they start to write a new text, this text will be analyzed orthographically by the intelligent dictionary in a first step (2) [7]. Proper feedback, based on the spelling mistake and the category, will be displayed to the student - as shown in Fig. 2. In this phase, he/she can continue to correct the text (3) which supports the self-reflexivity of spelling mistakes by trying to correct them independently [18] and finally submit the text to the teacher (4). After the correction by the teacher, the student is informed about the report (7) or the necessity to redo the text writing (7a), then the process starts again (1). If the teacher has finished the review and correction of the essay, the pupil can blog the text in one of the three available blogs (8). Further, based on the evaluation of the texts, exercises are recommended to the student for self-learning (9).

Workflow for Teachers

As soon as the pupil submitted a text, the teacher gets a notification (5). The teacher can correct the submitted text within the platform concerning various aspects and add a personalized feedback. In the next step the teacher can either let the student edit the text according to the given feedback in order to resubmit his/her text again (7a) or make the final reviewed version available in the students’ area (7). Concerning the orthographic competence of a specific pupil or the class in general, the teacher can inspect the performance according to the qualitative analysis and decide to assign spelling exercises to the pupil and/or class (10).

3 Results

Since the platform will be used by schools in the course of 2016/2017 our initial research aims to proof our concept of the system. Since there are so many possibilities to spell a word incorrectly it is important to test the system with authentic mistakes from authentic texts written by pupils of our target group. In order to conduct this analysis, we collected 60 essays written by students of 3rd grade, aged around 8 years within the project group. These texts are digitized and made anonymous.

3.1 Findings

The collected essays contain 405 sentences with 3792 tokens (words and punctuation marks). The amount of characters is 19237 including white space (15694 without white space). In the collected essays 549 spelling mistakes can be found. Our intelligent dictionary responded to 95 of these 549 spelling mistakes with the appropriate feedback. Currently our intelligent dictionary covers 17.3% of the total found spelling mistakes in the 60 essays.

The top 5 categories of the analysis based on the intelligent dictionary are (i) “gemination” (which means that only one consonant instead of two is spelled, e.g. <*gesamelt> instead of <gesammelt> ‘collected’), (ii) “complex graphemes” (which means, that more than one letter is necessary for spelling one phoneme, e.g. <*speilen> instead of <spielen> ‘to play’), (iii) “use of lower case letters instead of upper case letters” (e.g. <*buch> instead of <Buch> ‘book’), (iv) “spelling of the s-sound” (e.g. <*weis> instead of <weiß> ‘white’), “word to memorize” (this category contains words, that cannot be spelled correctly by applying a strategy, e.g. <*unt> instead of <und> ‘and’).

Table 1 shows the top 5 categories and the number of spelling mistake occurrences and percentage over all analyzed essays within the intelligent dictionary:

Table 1. Top 5 categories.

Those top 5 categories cover 72.7% of the found spelling mistakes from our intelligent dictionary (see Fig. 4). Since the category “complex graphemes” contains also missing dieresis (e.g. <u> instead of <ü>), which is a common mistake in handwriting, this category probably will not reach such a top place when children are typing on a keyboard because all German letters that require dieresis (<ä, ü, ö>) are represented on the keyboard. Therefore, this phenomenon should not occur very often, whereas the phenomenona of <i.e.> instead of <ei> and/or <ei> instead of <i.e.>, that belong also to the same category are likely to happen. Problems in the field of capitalization are, like problems with spelling the different s-sounds, very common in the acquisition process.

Fig. 4.
figure 4

Top 5 categories.

Of course it was expected that not all mistakes are recognized by the intelligent dictionary. After the first proof of concept it is possible to describe these constraints closer:

First, some categories and/or phenomena are considered in the system, but the spelling mistakes need to be collected on basis of the written texts of the users. This is especially true for the use of English words in German texts (e.g. <*Capten> for <Captain>) and for names (e.g. <*Nickolaus> for <Nikolaus>).

Second, some spelling mistakes need to be analyzed by hand because mistakes concerning phoneme-grapheme-correspondences cannot be considered, since there are endless possibilities of disregarding the grapheme-phoneme-correspondences. As a consequence, the intended word is only recognizable due to the context because graphemes are for example either missing (e.g. <*Kunt> for <Kunst> ‘art’) or in the wrong position (e.g. <*Geschneke> for <Geschenke> ‘presents’). This kind of mistakes will be collected, but it is not possible to systematically categorize them in advance in order to give a feedback.

Third, since the intelligent dictionary works with a limited selection of word forms and their corresponding mistakes, new words that pupils are using frequently should be added systematically to the system, e.g. <*hausaufgabe> for <Hausaufgabe> (‘homework’).

Fourth, a challenging task will be to teach the system that some words are spelled correctly, although the dictionary does not recognize it as a correct word, because the word is newly coined, e.g. <Partyhütchen> (‘party hats’).

4 Discussion and Conclusion

In this study we described the concept and system behind the project IDeRBlog and its workflow for student and teachers. A first evaluation with texts from students are showing promising results for future evaluations and enhancements for the intelligent dictionary. Our findings indicate that the categorized mistakes are corresponding with the mistakes children actually make when writing texts. This is an important finding since the categorization of mistakes is based on a complex systems of phenomena and categories. Due to the proven stability of the system, new words and mistakes can be added to the system in order to make the intelligent dictionary more powerful and to gain significant analysis in future.

The implementation of this system in schools has great advantages for teachers and pupils: First, the teacher gets easily a qualitative analysis of the spelling errors of his/her pupils. Until now qualitative analysis of spelling mistakes of essays need to be done by hand. This is a time consuming process that also requires a lot of knowledge (for details see [19]). Second, based on the results of the qualitative analysis the teachers know which orthographic areas are the most problematic ones of a specific pupil or of a whole group. By using the platform and the possibility of retrieving a qualitative analysis of the number and percentage of spelling mistakes per category the teachers are supported in planning their orthography classes. Furthermore, the system can be used for evaluating the progress of pupils in acquiring the German orthography in general or in acquiring specific categories of the German orthography. The advantage for the pupils is that they can improve their spelling in an attractive digital environment that is based on scientific finding. Therefore, the platform is a trend-setting development and application in the field of E-Learning and learning analytics with methodology in a certain subject - namely German orthography.

The system also has big advantages for researchers in spelling acquisition: It will be the first time that analysis of the used words and their spelling errors are possible. This can have a huge impact on understanding acquisition processes and consequently modifications of teaching and learning approaches.

Although there are many promising advantages, there is also a drawback: The advantages can only be considered if a big community is using the system frequently because analysis for pupils can only be carried out in case there are enough correctly and incorrectly spelled words. This aspect also affects the impact of the interpretation of this preliminary findings. The data basis is limited to 60 short texts from 3rd graders. Therefore, the presented findings show the possibilities of this system, but no real empirical evidence. Since our system is developed for pupils aged from 8 to 12 years, we should also add texts from 4th to 6th graders. The more schools and classes will use the system, the deeper will be the insight in the spelling process and orthographic competence of German speaking users. This is expected to happen in the course of 2016/17 when the whole systems is offered to the public.

In order to improve the intelligent dictionary for the users, the system should grow by adding words and their word forms as well as their possible mistakes based on the texts written by the pupils. Further we plan to predict the performance of students, make personalized recommendations for exercises provided by our platform and benchmark the performance of the student’s progress in spelling acquisition.