Introduction

This study describes a manual approach for identifying genres in relatively small and specialised corpora. It then uses this approach to investigate a representative sample of English language teaching material in order to evaluate the support provided to second language (L2) writers. Quantifying the representation of both text types and genres can show which genres are included in the same text type category and therefore to which extent the existing categories lack homogeneity and conceal variation. These aims derive from the need to raise genre-awareness in educational settings, especially where second language learners are involved, and a need to sensitise those involved in corpus building on the great impact of early-stage decisions such as the approach chosen for text classification. These needs are described in more detail in the following sub-sections.

Educational Settings: The Importance of Genre Awareness for L2 Writers

In the development of the child as a social being, learning “how to behave linguistically” happens indirectly “through the accumulated experience of numerous small events” (Halliday 1978, p. 9, 13). Exposure to these events however, is usually limited for foreign language learners. They need to combine linguistic competence and communicative competence based on pragmatic knowledge in order to interact successfully (Kroll 1990; Paltridge 2001; McNamara and Roever 2006; Sifianou 2006; Yasuda 2011; Taguchi 2012).

Learning a genre means learning how to “participate in the actions of a community” (Miller 1984, p. 165). The notion of genre-knowledge is important to L2 writing teachers because “it stresses that genres are specific to particular cultures and communities, reminding us that our students may not share this knowledge with us” (Hyland 2004, p. 54). The combination of linguistic ability and sociocultural awareness is a common interest in both ‘genre analysis’ and ‘pragmatic competence’ studies in L2 educational settings. Whether measuring this type of learners’ competence or trying to find ways to raise it both fields have been affected by Hymes’ (1972) seminal work on communicative competence. The recognition that using language effectively means being able to convey and interpret meanings in a social situation combined with the raised sensitivity towards L2 learners’ lack of exposure in such naturalistic settings has greatly affected second language research. It has also led to the birth of Interlanguage pragmatics (ILP), a sub-field “that focuses on second language learners’ knowledge, use and development in performance of sociocultural functions in context” (Taguchi 2012, p. 1).

Despite the traditional focus of pragmatics on spoken discourse, there are views now that see written/spoken discourses as not “dichotomous but complimentary sides of a single pragmatic competence” (Ifantidou and Tzanne 2012, p. 49). The realisation that spoken and written discourses, apart from well documented differences, do share a number of similarities in structure as well as in pragmatic and/or communicative functions and associated linguistic and rhetorical features has brought genre analysis and the pragmatics of discourse closer (Tardy and Swales 2014). In addition, the overall willingness by pragmatics to study language in connection to specific goals and needs coincides with the SFL view of genres. As Tardy and Swales (2014) point out, the overall meaning of the book title ‘How to Do Things with Words’ by Austin (1962) is similar to Martin’s (1985b, p. 250) definition of genres, ‘Genres are how things get done, when language is used to accomplish them’.

In international English language proficiency exams genre awareness is required and often taken for granted (Hamp-Lyons 2003). According to official guidance (Cambridge English First handbook 2016, p. 28), candidates should be able to recognise basic generic elements such as a purpose for writing, identify the target reader and choose the appropriate style or tone. Similarly, in the Cambridge English Advanced handbook (2016, p. 31), they are warned against the possible treatment of different tasks based on topic similarity: “A pre-learned response on a similar topic is unlikely to meet the requirements of the specific task in the exam.” It is clear that the exam tasks expect candidates to decode information that will affect their writing by reading the prompts.

The question identifies the context, the writer’s role and the target reader, which helps the candidate to choose the appropriate register. It is also very important that students learn to distinguish between the various task types required by the questions in Part 2. Even though a candidate may display an excellent command of the language, an answer will only achieve a high mark if all the above factors are taken into account. (Cambridge English Proficiency handbook 2016, p. 23)

Keeping in mind that in testing contexts time and word constraints as well as raised anxiety levels can affect response (Hamp-Lyons 1991; Ferris 2008; Gebril and Plakans 2015), one would expect material with a focus on exam preparation to provide explicit guidance that helps candidates take control of the texts they need to write. In reality though, mainstream teaching material seems to lack important genre and register advice or provide information which is unclear and confusing.

This lack of explicit guidance has been observed in various educational settings. Rothery (1985, p. 80), for example, talks about the ‘hidden curriculum’ and Christie (1984, p. 20) argues that schools often fail to show pupils explicitly what the nature of each genre is. She even doubts that teachers are always aware of the types of writing they are teaching. In the same line, Hyland (2003, p. 151) criticises the lack of explicit guidance towards writing different types of texts in courses where instruction is not genre-based and where “learners are expected to acquire the genres they need from repeated writing experiences or the teacher’s notes in the margins of their essays”. While interviewing university tutors, Nesi and Gardner (2006), realised that although tutors appreciated argument, structure, clarity and originality in texts they could not be explicit on the ways these characteristics could be realised or recognised in text.

Corpus Building: The Importance of Text Classification

Various researchers state that identifying categories of texts during the corpus compilation process is an important consideration. They also see the need for informing future users of the criteria used for this categorisation (Biber et al. 1998; Biber 2010; Lee 2001; Sharoff 2015). Using corpora that are not categorised according to genres can cause problems due to the lack of homogeneity and may produce misleading findings especially regarding features that are associated with style (Stamatatos et al. 2001; Biber 2006). Despite the need to classify texts in corpora in terms of genres, and the active interest in automatic genre identification processes (Stamatatos et al. 2000, 2001; Santini 2006), Sharoff (2015, p. 306) observes that “getting a suitable set of genre labels is surprisingly difficult. The major corpora disagree with respect to their genre inventories”.

This inconsistency could be attributed to four main factors. First, the lack of consensus in the literature on what the terms ‘genre’, ‘text type’ and ‘register’ actually represent, second, the lack of a systematic and widely accepted method of categorizing large groups of texts (Stubbs 1996; Lee 2001; Passoneau et al. 2014; Sharoff 2015). The third possible factor is related to size. The advantage of a large set of texts in analysis as opposed to one or two texts has been one of the strongest arguments in favour of corpus linguistics for years. There is a belief that the bigger the corpus the better. This is because when there is more data the researchers can be more confident about their findings especially when these involve statistic analyses. Using manual genre identification processes in such large corpora can be a complicated and time-consuming process. Classifying texts according to topic or text type on the other hand involves less risk-taking and can be done by people with no linguistic background.

It has been shown, however, that specialised corpora can be much smaller and that foreign language researchers tend to use smaller corpora, which are easier to compile and analyse but designed according to strict criteria and created for specific research (Henry and Roseberry 1996; Ooi 2001; Tribble 2001; Pravec 2002; Flowerdew 2005; O’Keeffe et al. 2007). For Flowerdew (2004, p. 19), the optimum size depends on what the corpus contains and what is being investigated. Biber (1990), also supports smaller corpora representing the full range of variation as opposed to larger general corpora when the focus of analysis is text variation. Handford (2010, p. 259) adds some advantages of specialised corpora for genre analysis; the fact that in these corpora the analyst is “probably also the compiler and does have familiarity with the wider socio-cultural dimension in which the discourse was created” (Flowerdew 2004, p. 16) and the fact that adding tags manually, to enrich genre analysis, is a procedure feasible only on small corpora (Flowerdew 1998).

Representativeness, ‘the extent to which a sample includes the full range of variability in population’ (Biber 1993, p. 243) is a critical factor of a quality corpus (Biber 1993; McEnery and Wilson 2001; O’Keeffe et al. 2007; Sardinha and Pinto 2014) and especially in the case of specialised corpora, size becomes a secondary issue (Lee 2010, p. 114). Therefore, now that the value of smaller and specialised corpora is recognised, building such corpora can start with manual identification of genres and classification of texts based on genre categories.

A final reason that could explain the scarcity of corpora that are classified based on genres, is the need of complete texts in order to identify genre markers (Biber and Conrad 2009; Handford 2010). The widespread tendency of using excerpts of texts especially in large corpora leaves no room for genre identification based on the texts.

‘Genre’, ‘Text Type’ and ‘Register’: Attempting to Clear Up Terminology

This section aims to clarify the confusion among the terms ‘genre’, ‘text type’ and ‘register’, reviewing their use in the literature. Even though this is not an exhaustive review of what has been said on the matter, it is indicative of the lack of consensus in the field. In spite of different views concerning ‘genre’ even among the genre scholars, resulting in three Genre Schools (Hyon 1996), these scholars have consistently worked with genres and the main idea in their definitions of genre is similar. Their differences refer mostly to the focus of analysis and the way they link genre to teaching practices. Our interest here lies not on pinpointing these differences once more but on investigating the ways the terms ‘genre’, ‘text type’ and ‘register’ have been used in relation to each other by scholars in general.

‘Genre’ Versus ‘Text Type’

There are researchers who mainly use the term genre in their work (e.g. Halliday 1978; Swales 1990, 2004; Bhatia 1993; Devitt et al. 2004; Halliday and Matthiessen 2004; Nesi and Gardner 2012), others who have used both the terms text type and genre (e.g. Biber 1988, 1989; Stubbs 1996).

Some researchers have expressed their views on the distinction between the two terms but not all of these views share the same basis for the distinction. Biber (1988, p. 70, 1989, p. 6), for example, sees genres as defined and distinguished on the basis of systematic non-linguistic criteria and text types on the basis of strictly linguistic criteria, that is, similarities in the use of co-occurring linguistic features. For him, text types are groupings of texts that share linguistic features irrespective of genre. Based on this conceptual framework he has found that the same genre can differ greatly in its linguistic characteristics and that different genres can be quite similar linguistically. From his perspective “genre distinctions do not adequately represent different text types.”

Paltridge (2001, pp. 63, 123) defines the term text type as patterns of discourse organisation that occur across different genres such as description, narrative, instruction, explanation but later on he refers to a letter, a story and an advertisement as genres too. Knapp and Watkins (1994) link the term genre to language processes such as describing, explaining, arguing and the term text type to texts seen as products or things such as reports, expositions and stories. They encourage teaching genres as processes, rather than products as the generic features remain consistent and can be applicable to all text types written by students. From this perspective commonly used text types often deploy several genres. For Glasswell et al. (2001), genre is driven by functional purpose whereas text type is affected by mode (text form). They point out that the purpose is able to change even if the type of the text remains the same and uses letters to explain:

Letters may be written to make complaints, to argue a point, to recount an event, to make an explanation, to tell an anecdote, or to advertise a product. In short, letters may have different purposes and, thus, the structuring of these texts and their lexicogrammatical resources will differ significantly, regardless of the fact that each will still be considered a letter in terms of layout and transmission. (pp. 2–3)

Even though this view differentiates text type from genre it gives prevalence to the term genre (seen as functional purpose) instead of text type (seen as text form). It is therefore quite different from Biber’s distinction explained above, and his preference in studying text types instead of genres. Cummings (2003, p. 194) sees text types as components of genre. He labels narrative, description, exposition, dialogue and monologue as genre categories and novel, travel brochure, article, conversation and oration as text types.

Stubbs (1996, p. 12), looking back at categorisations that have been proposed based on text types, says that “none is comprehensive or generally accepted”. Paltridge (1996, p. 237) notices that “the terms ‘genre’ and ‘text type’ seem to have been conflated with the term ‘genre’ being used to include both of these notions” while Lee (2001, p. 41) calls the term text type an “elusive concept” and finds it redundant to have two terms which cover the same ground.

‘Genre’ Versus ‘Register’

For Halliday (1978, p. 111) “a register can be defined as the configuration of semantic resources that the member of a culture typically associates with a situation type. It is the meaning potential that is accessible in a given social context”. A situation type is characterised by three factors: what is happening, who is taking part and what part the language is playing. These three variables are called ‘field’, indicating the type of social action, ‘tenor’ referring to role relationships and ‘mode’, denoting the symbolic organisation. According to him these three variables, taken together, determine the ‘register’, that is, “the range within which meanings are selected and the forms which are used for their expression” (Halliday 1978, p. 31). Exploring register means attempting “to understand what situational factors determine what linguistic features.” Later on, he defines register as “a syndrome of lexicogrammatical probabilities” (Halliday 1992, p. 68).

Derewianka (1996, p. 47) also sees register as the configuration of field, tenor and mode and associates genre with purpose. She sees the notions of genre and register as inseparable. She considers an awareness of the genre as the basis for the prediction of the overall organisation of the text (stages) and an awareness of the register as the basis for the prediction of the language features that generally characterise such a text.

Martin (1993, p. 156) sees genre as a layer above register and as encompassing register. For him, “genre is a way in; it works to raise awareness, and it works in a way which register analysis alone had not been able to work before”. Thompson (2014), sees genre as register plus communicative purpose and suggests that we see register as cloth and genre as garment:

the garment is made of an appropriate type of cloth or cloths, cut and shaped in conventional ways to suit particular purposes. Similarly, a genre deploys the resources of a register (or more than one register) in particular patterns to achieve certain communicative goals. (p. 52)

In a detailed description of the term register, Conrad and Biber (2001, p. 3) explain that “register distinctions are defined in non-linguistic terms, including the speaker’s purpose in communication, the topic, the relationship between speaker and hearer, and the production circumstances”. This way of identification sounds similar to the Systemic Functional Linguistics approach which sees register as the configuration of field, tenor and mode. The notion of purpose however, in the SFL perspective is associated with genre and genre is a greater notion that encompasses register.

In practice, however, Biber’s view of text classification based on register does not necessarily involve communicative purpose as a criterion. Conrad and Biber (2001, p. 3) distinguish between a specialized register “corresponding to the extent to which the register is specified situationally” and a general register in which texts “tend to exhibit a wide range of linguistic variation”. In general register categories such as conversation or newspaper language there are texts with all sorts of different relations between speaker and hearer as well as purposes within the same text category. Lee (2001, p. 40), commenting on Biber’s (1989) Multi-Dimensional approach says that this classification “is at the level of individual texts, not groups such as ‘genres’, so texts which nominally ‘belong together’ in a ‘genre’ (in terms of external criteria) may land up in different text types because of differing linguistic characteristics”. It is interesting that in other work (Biber 2006, p. 11), the term register is used with “no implied theoretical distinction to genre.” In Biber and Conrad (2009), one can find an extensive explanation of the difference in perspectives which again makes the association of purpose with register. In his review, Bhatia (2012) says that this reminds him of a typical definition of genres and still feels that the boundaries between register and genre remain blurred.

Theoretical Positioning and Methodology for Text Classification in the WriMA Corpus

Reviewing the suggested approaches to genre and register it was clear that even though there was some common ground among the researchers, they did not always justify why they choose one perspective over another and that in some cases inconsistencies on the use of these terms occurred within the same work or work by the same researcher over the course of time. Therefore, hopes for a consensus seem a bit utopian at present. Reviewing the literature however, from time to time on the use of these terms can assist our understanding of similarities and differences among perspectives in order to define more consciously what exactly we are analysing as researchers or teaching as educators and how we choose to do that.

We chose to follow the SFL theoretical principles in this work due to their clarity and consistency observed in the description of the terms of interest here, namely genre, text type and register. In line with Martin (1993) and Thompson (2014), we see genre as encompassing register where the communicative purpose together with field, tenor and mode determine the overall structure of the text, what is going to be written or discussed, the way language is going to be affected by the relations between the writer/reader or the speakers, and the most appropriate text form. We see the notion of text type as related to mode, denoting text form, in line with Glasswell et al. (2001), and thus, as one of the variables determining register. Essays or letters are text types in this sense, not genres. In this view, genre and register are inseparable (Finegan and Biber 1994; Derewianka 1996) but genre can be studied on its own when the focus of the investigation is on purpose and structure. Register studies, on the other hand, need to consider genre (in this view, associated with purpose) in the choice of texts to be included in a category as it can alter the language used. In agreement with Thompson’s perspective mentioned above, it is the garment (genre) that will determine which type of cloth is suitable (register).

The WriMA corpus (Melissourgou and Frantzi 2015a), used in this study, is a pedagogical corpus. Meunier and Gouverneur (2009, pp. 179–201) have defined this type of corpus as “a large enough and representative sample of the language, spoken and written, a learner has been or is likely to be exposed to via teaching material, either in the classroom or during self-study activities”. It contains 1151 model writing answers (253,025 tokens), from textbooks and educational websites. Both types of sources have a focus on EFL examinations. Model answers included in the corpus target various international English language exams as the scope was to investigate genres in this context not the preparation for a specific exam.

As representativeness is a critical factor of a quality corpus (Biber 1993; McEnery and Wilson 2001; O’Keeffe et al. 2007), we have tried to collect writing answers from as many sources of both types. In total, 93 different sources have provided the content, with 56% coming from the Web and 44% from printed books. Texts included in the corpus are also marked for proficiency level in accordance with the Common European Framework for Languages (CEFR) levels (Council of Europe 2001) and refer to levels B1 up to C2 as these are the most intensively tested levels. Information included in the metadata of the corpus refers to the prompts, the CEFR levels, the name of targeted examination and the source. Texts were initially classified according to text type categories, the wide-spread classification policy used in this type of educational material. After the identification of genres, representation of texts was measured for both text type and genre categories.

The study relies on the presentation of these texts in educational settings as model answers. It has been shown that learners tend to use model texts productively and often seek them out when not provided (Tardy 2006; Melissourgou and Frantzi 2015b). Any misrepresentation in these sources, that is, any possible failure by writers to actually adhere to genre conventions has not been addressed and this should be taken into consideration. We see texts and task prompts written by professional writers (already proofread and presented in well-known material) as a reliable source in order to gain insight for genre identification and specify the scope of this research narrowly within the EFL context and language proficiency exams. In this context, model texts usually try to simulate real-world genres and communicative purposes but there are a number of predefined restrictions on the prompts (word-limits, language use according to proficiency level, expected pragmatic awareness as affected by age and distance from naturalistic settings). Therefore, inferences drawn from this study are not meant to necessarily apply to real-word genres or even genres used in other educational settings.

The first and most important step in genre analysis is the identification of genres. If done carelessly, it can jeopardise the results of the analysis. The process of genre identification described here (Fig. 1), has mainly been based on functional purpose (Martin 1985a) and register variables, field, tenor and mode, as described by Halliday (1978).

Fig. 1
figure 1

Genre identification in the WriMA corpus

As the corpus metadata includes the task prompts, contextual information is retrieved without the need for reading the text itself. The prompt also defines the text type, asking for example, specifically for an ‘essay’ or a ‘letter’. There is, however, need for reading individual texts where the prompt leaves choice as to the development of the texts. This may happen for example in some argumentative essay prompts where the writer is free to choose between exposition or discussion.

Coutinho and Miranda (2009), p. 42) call the function mechanisms used to identify genres ‘markers’: “the marker is a semiotic mechanism (of any sort) that functions like any clue or indication of the updating of a generic parameter with distinctive value”. They identify two big classes of genre markers: the self-referential and the inferential. Examples of the first type of markers are the labels used in the prompts in this study (e.g. article, essay, letter). Phrases from the body of the texts which help the experienced reader activate genre knowledge and distinguish categories of texts are inferential markers. These markers are seen in relation to their position in the text and the associated rhetorical move. In expository essays, for example, writers are called to put forward a viewpoint, provide arguments in defense of or as objections to the proposition made. They need to justify their position and reach a conclusion. A common structure of the expository essay, observed also in this context, is the following:

  • ^Introduction of the issue ^Thesis statement ^Arguments (2–3) ^Conclusion

Tasks asking for a discursive essay usually ask the writer to discuss two opposing views presenting arguments for and against and then form an opinion based on these arguments. During the analysis the following structure has been observed:

  • ^Introduction of the issue ^Argument in favour of one side ^Argument in favour of the other side ^Summary of pros and cons + Conclusion in favour of one side

Prompts leaving choice for an expository or discursive essay were the hardest part during the identification process; first because they both belong to the larger argumentative family genre and consequently share a lot of features and second because essays were the largest text type category in our corpus. However, for the majority of the texts the rhetorical organisation described above was evident. Inferential markers such as the contrasting connectors ‘on the one hand’ ‘on the other hand’ used to open two separate paragraphs in the main body of the essay were an initial indication of discursive essays where two sides of the argument are presented almost equally. First person pronouns, however, in the beginning of the text combined with verbs introducing an opinion (e.g. ‘think, ‘believe’) or similar phrases (e.g. ‘in my view/opinion’) were clues for an expository essay where writers express their view and then try to justify it with arguments.

The rest of the essay genres were easier to identify from the prompts. Those who clearly asked for the reasons of a problem/situation were identified as Factorial and those that asked for the consequences of a problem/situation were identified as Consequential (Nesi and Gardner 2012). Prompts presenting an issue and asking for both reasons and consequences created a smaller category of texts named Factorial and Consequential. Then, there were prompts which asked for a description of a person/object/event and prompts that clearly asked for solutions to a problem. Texts with such prompts were identified as ‘descriptive’ and ‘solutions to a problem’ essays respectively. Examples of prompts for each category in the essay genres are shown below:

  1. 1.

    Nature used to be a force that humans struggled against to survive. Today, it is nature’s survival that is threatened. To what extent do you think this is true? Discuss, giving specific examples. (Expository Essay)

  2. 2.

    Some people think that spending a lot on holding wedding parties, birthday parties and other celebrations is just a waste of money. Others, however, think that these are necessary for individuals and the society. Discuss both views and give your opinion. (Discursive Essay)

  3. 3.

    Levels of youth crime are increasing rapidly in most cities around the world. What are the reasons for this? (Factorial Essay)

  4. 4.

    The authorities in your city propose spending a large part of their budget on the creation of an environmental park. What do you think would be the effects of creating an environmental park? (Consequential Essay)

  5. 5.

    As countries have developed there has been a trend towards smaller family sizes. Why does this happen? How does this affect society? Give reasons for your answer and include any relevant examples from your own experience or knowledge. (Factorial and Consequential Essay)

  6. 6.

    What can be done to reduce the pollution of the environment in modern cities? (Solutions to a problem Essay)

Even though two texts may share the same purpose the register variables may be different. The purpose may have been for example ‘to offer solutions to a problem’ but the text type asked may have been an essay, an article or a letter which have different targeted readers (tenor) and different text formats (mode). In this case three different genre categories have been created: Solutions to a problem Essay, Solutions to a problem Article and Solutions to a community problem Letter.

On other cases, the mode was the same but the purpose was different as, for example, in Letters to the Editor, a text category name commonly used in the literature. In our view, this has been a broad classification based on tenor, placing emphasis on a specific addressor-addressee relationship underestimating the importance of purpose and the variation it can cause in the register. A letter in this category can be written to inform about new facilities in the area, to praise the editor about a well-written article or to complain about a change that affects the public. As purpose has been the main criterion for classification in this study these letters were allocated to different genres.

Reports were divided in two categories as the basis for reporting is completely different for the two tasks. The first type (named ‘Data Report’) asks students to report and summarise based on data provided in the rubric, presented in graphs, while the second type (named ‘Personal Observation Report’) asks them to report based on personal experience. This difference was considered important, having the potential to alter basic features of the language used, and led to the creation of two separate genre categories for reports.

  1. 1.

    The table shows the Proportions of Pupils Attending Three Secondary School Types Between Between 2000 and 2007. Summarize the information by selecting and reporting the main features and make comparisons where relevant. (Data Report)

  2. 2.

    A group of American students is coming to visit your school in a few months. They have never been to your town before so their coordinator, Mr. <Surname>, has asked you to write a report about interesting places worth visiting in the area. Write a report describing the places and explain why you think they may be of interest to the group of visiting students. Write your report to Mr. <Surname>. (Personal Observation Report)

The procedure was more complicated in one category. Several textbooks included the letter/email text types in the same prompt as if the same task could be written as a letter or an email. The model answer provided was not defined in terms of text type category leaving the impression that it could be used in either case, letter or email. The same was observed in official examination guides when describing the text types needed:

AN EMAIL/A LETTER is written in response to the situation outlined in the question. Letters and emails in the Cambridge English: First Writing paper will require a response which is consistently appropriate in register and tone for the specified target reader. Candidates can expect to be asked to write letters or emails to, for example, an English-speaking friend or colleague, a potential employer, a college principal or a magazine editor. (Cambridge English First 2016, p. 30).

There was a confusion on text type in this case which did not occur in other text types. These model answers were carefully studied looking for inferential markers but no obvious difference was evident. For that reason, the criteria for grouping such texts have remained consistent but we have not distinguished between letters and emails unless there was a generic category that all prompts asked for a letter or email only. In the results section the category may include for example the Application Letter/email meaning that this genre was presented in material under a double text type label but it may also include the Reference Letter meaning that this genre was found only under the letter label (Table 1).

Table 1 The representation of text types and genres in English language educational material (as represented in the WriMA corpus)

While the use of the formal/informal letter category name is an indication of variation, formality is just one out of many features that could relate to letters. Even though it is affected by writer/reader relations the priority given to this variable as the one and only factor distinguishing texts among the letter category conceals important variation caused by different purposes. Letters were thus, classified according to the criteria used throughout the study and the terms formal and informal were retained only in cases where the purpose was the same but tenor was different (e.g. Formal Apology Letter versus Informal Apology Letter).

An appropriate name is then suggested which best illustrates the basic features and requirements of the genre for less experienced writers. This naming has been based on purpose and mode. Terms for purpose have been chosen because we share the view that purpose is the “prototypical criterion for genre identity” (Swales 1990, p. 10). Terms for mode, in the sense of text types (Glasswell et al. 2001), are widely known and can trigger some subconscious knowledge of the genre at hand. Both terms function as signposts for writers with no expert knowledge of genres.

Text types and Genres in the WriMA Corpus

The text type categories found in the material are presented together with the categories based on genre (Table 1). The actual numbers of texts included in each category are also given as percentages so that the reader can easily see the coverage of specific text types and genres in the material.

The new ‘genre categories’, with a more student-friendly naming, offer a more accurate view of what is required in these tasks. Both the initial as well as the final classification in the table offer information about the representation of each category in the educational material used for this corpus. The seven text type categories in which texts were initially classified in the corpus are represented in the educational material as follows: 36% for Essays, 17% for Formal letters/emails, 15.3% for Reports, 13.9% for Informal letters/emails, 7.6% for Articles, 6.8% for Stories and 3.4% for Reviews. Out of seven initial text types (nine if letters and emails are seen as different categories) this process revealed 33 different genres: Seven genres for the Essay tasks, two for the Reports, one for the Reviews and the Stories, four for the Articles, twelve for Formal Letters/emails and six for Informal letters/emails.

Overall, the classification of texts according to genre following the above method has shown a wide range of genres derived from the existing categories especially in letters. The representation is uneven in both text type and genre categories but this could possibly relate to the actual exam tasks in this context. It is not the aim of this study to investigate how actual tasks used in the past relate to the text type categories presented in teaching material and whether the choice for selecting model texts is based on similar measurements, estimations or purely on instinct. A contrastive study, however, using two corpora, one for past ‘writing’ papers (prompts) and one similar to the WriMA corpus would be interesting and helpful towards the evaluation and improvement of material. Of course researchers in this case would need access to past papers from all examinations covered.

Our main purpose here has been to show quantitatively that the present labelling of categories conceals important information. The range of the derived generic categories cause doubt as to the similarity of the language used under the broad initial categories. Classification of texts according to genre provides narrower categories, allowing even the finer variation to be identified. In addition, the more informative labelling of these categories can function as a shortcut to the exam candidates’ understanding of the task requirements and is strongly suggested to material writers. Let us not forget that the role of exam tasks is to test learners’ competency but the role of educational material is to prepare learners and provide explicit guidance towards that goal.

Conclusions

Due to the fuzziness around the term genre and the time and effort genre-identification may involve, this process is often avoided both in educational material as well as in corpus building. Even though many researchers have stressed the importance of this stage and see the need for explaining the criteria used for text classification in corpora, the description of the process of genre identification and the linguistic framework on which it has been based is rarely addressed in the literature. Despite the active interest in automatic genre recognition/identification, there is no widely accepted method of categorisation based on genres yet.

In order to classify texts more effectively and avoid getting lost in the maze of perspectives on terminology, an approach for identifying genres has been suggested and described in detail. It is a manual identification process therefore, more appropriate for small corpora. It is, however, less time-consuming than approaches which require reading whole texts as this approach exploits the information provided in task prompts and makes use of whole texts only in cases this is information is not adequate. It is, thus, more appropriate for corpora with rich metadata. The solid linguistic framework of Systemic Functional Linguistics on which it has been based, combining functional purpose and register variables as well as self-referential and inferential genre markers, provides a reliable method for grouping texts that actually represent a genre. The additional stage of ‘naming’ these genre categories mainly based on purpose and mode is meant to guide anyone with a need to understand genre requirements such as second language novice writers.

This method is suggested mainly for pedagogical corpora but the major part that relies on prompts in order to define the genre category could also be implemented in learner corpora. It could be used to identify and assess pragmatic and sociolinguistic competence which is a type of error correction that has largely been underestimated and neglected in learner corpora studies.

The identification and measurement of both text types and genres in teaching material has shown that text type classification is too general and can in some cases misguide teachers and learners as it ignores the importance of communicative purpose. It also perpetuates the unnecessary burden of solving the mystery of the generic requirements of the writing task. The classification and labelling of categories based on text type has prevailed in exam writing tasks causing a subsequent adoption of the same labelling in educational material. Learners, however, run the risk of assuming that the requirements of the task set in examinations will be similar to the one they have been extensively taught under the same label (e.g. ‘essay’). This treatment of texts in material perpetuates the misconception that “different ‘genres’ are quite simply different ‘text types’ each characterised by certain pre-determined textual features” (Tang 2006: introduction). Using model texts is an important stage in the learners’ immersion in the genre (Knapp 1989; Derewianka 1990; Flowerdew 1993; Charney and Carlson 1995; Hyland 2004; Tardy 2006, 2009) and teaching material should keep providing such guidance avoiding, however, overgeneralisation in broad text categories.

Classification based on genres appears to be more helpful for learners and a more solid basis for linguistics researchers as it takes into consideration a lot more variables that can cause variation among texts. Findings support Lee’s (2001, p. 37) view that “genre is the level of text categorisation which is theoretically and pedagogically most useful and most practical to work with” and the view expressed by Freedman and Medway (1994, p. 2) that focusing only on textual form is a surface trait of the underlying regularity among genres.

A lot of work needs to be done in order for the prevailing attitude to text classification and labelling to change and move towards a genre perspective. Educational practice in primary and secondary schools in Australia and New Zealand has already set an example of genre-based teaching (Board of studies 1998; Glasswell et al. 2001; Knapp 1989, 2002; Knapp and Watkins 1994; Martin 1985a). Based on the same SFL principles its new implementation in this context could additionally benefit from the recent contribution of corpus linguistics in genre studies and the insight it has offered. The whole process in Australia and New Zealand was a national effort however, as it involved public schools. This is a different area with independent testing bodies, teaching material editors and publishers. Foreign language learners can greatly benefit from an explicit rather than implicit knowledge of genres. We encourage both researchers and educators to place more emphasis on teaching that helps learners develop pragmatic skills and promotes communicative competence.