What are they Talking About? A Systematic Exploration of Theme Identification Methods for Character Speech in Dramatic Texts

Reiter, Nils; Willand, Marcus

doi:10.1007/978-3-476-05886-7_20

Nils Reiter² &
Marcus Willand³

Part of the book series: Germanistische Symposien ((GERMSYMP))

2010 Accesses

Zusammenfassung

This article systematically explores different methods to describe and compare the thematic content of character speech in drama. The problems here are two-fold: The determination of the theme of a text segment is challenging, as it might be subtle or deliberately hidden. Secondly, the evaluation of this determination is difficult, because gold standards do not exist. We approach these problems by comparing the output of different systems to previously stated expectations (postulates) that are partially derived from drama theory and history. Results show that all systems currently have difficulties reaching high performance numbers. Topic modeling and systems based on a general-purpose dictionary (GermaNet) achieve comparable performance.

You have full access to this open access chapter, Download chapter PDF

Classifying Written Texts Through Rhythmic Features

Drammar: A Comprehensive Ontological Resource on Drama

Dialogue analysis: a case study on the New Testament

Article 02 May 2019

1 Introduction

This article investigates theme detection in character speech of dramatic texts. It does not aim to contribute to drama(-historical) research, but introduces two methodological innovations to computational literary studies: We propose a procedure for the systematic comparison of text-analytic methods even if a gold standard does not exist or is difficult to create. We employ this procedure for the comparison of several implemented systems that assign some notion of ‘theme’ to a text segment. Henceforth, we use the term ‘theme’ to describe the content of character speech and the term ‘topic’ exclusively for the outcome of topic modeling. ‘Theme’ represents the goal, and ‘topic’ the outcome of one specific method (that may or may not achieve this goal).

The lack of reference corpora with annotated categories that are of relevance for literary studies is a methodological bottleneck for the development of text processing tools for literary texts. Existing and publicly available corpora allow researchers to test and/or train their tools, and to assess their quality. The existence of such corpora in other domains and with non-literary categories has been the driving force behind the progress made in computational linguistics over the past years. Creating reference corpora for literary phenomena, however, is a challenging task, for multiple reasons.^{Footnote 1} One reason is the nature of literary texts, as they are often deliberately ambiguous and open for different interpretations. In addition, literary categories are hard to define exactly,^{Footnote 2} which is often only uncovered during annotation.^{Footnote 3} In this article, we would like to offer a different perspective on the issue: Knowledge that has been established in literary studies might not be defined as exact as in natural science, but with the right procedure it can still be employed for method development.

The procedure we propose can be outlined as follows: We first formulate a set of postulates about dramatic texts. These postulates combine a structural element of the dramatic text with a non-interpretive hypothesis about the thematic content of this text element. For instance, we postulate that continuous copresence of characters leads to similar thematic content of their dramatic speech. The procedure does not require these postulates to be true for every drama. Though, we do maintain that they are plausible assumptions for a large number of dramatic texts and that results obtained by tuning or training on these postulates would be suitable for poetological or drama historical statements.

Dramatic texts are well suited for this analysis, because they are strongly structured by nature (compared to, e.g., prose texts) and this structure is encodable in a machine-readable fashion. The speech of characters is unfiltered by a narrator, the entire text is segmented into acts and scenes. Unsurprisingly, the dramatic structure has always played a role in the literary analysis of dramas. Many theoretical predictions are rooted in the drama structure that makes both the formulation of general postulates and their operationalization straightforward.

Most work in the past has been either analyzing the dramatic structure or the content of dramatic language. For social network analysis, one typically extracts co-presence information (which characters share the stage how often?) from the texts.^{Footnote 4} On top of the relational information encoded in networks, Algee-Hewitt uses betweenness centrality to make interesting observations.^{Footnote 5} Genre analysis, as for instance done by Schöch,^{Footnote 6} on the other hand, ignores most of the structure and just takes the text as a whole: All characters’ voices are merged into one.

There are only a few works that actually combine structure and content information: Nalisnick/Baird have employed sentiment analysis on the character speech to determine the turning point in the relation between characters (e.g., between Hamlet and his mother).^{Footnote 7} Bullard/Ovesdotter Alm describe corpus-linguistic investigations into two Scandinavian authors and determine how certain character classes are represented linguistically.^{Footnote 8} Karsdorp et al. focus on one specific character relation and describe a system to determine the most probable romantic partner for characters.^{Footnote 9}

This paper is structured in the following way: In Sect. 2, we discuss conceptually how we are going to compare systems on a task for which a gold standard does not exist. Section 3 introduces the theme detection systems. In Sect. 4 we discuss four postulates about dramatic texts, against which we compare the systems. Section 5 outlines the experimental setup and introduces formal and notational basics. In Sect. 6, each postulate is covered by one subsection, discussing its operationalization and experimental results. Section 7 concludes.

2 Comparing Systems without a Gold Standard

In order to compare the performance of different text analysis systems, one would typically presuppose the existence of a gold standard, against which the systems can be tested. While having a gold standard would be ideal, creating one is difficult in this case for two reasons that are related with the target phenomenon ‘theme’.

Defining a list of themes as annotation categories would in principle allow us to annotate text segments (e.g., character utterances) with such categories. Cast as a supervised classification problem, it would be straightforward to train a classifier that detects and distinguishes these themes. During the application of this classifier, however, we would only be able to detect the topics as they are defined in the gold standard. We are then not able to discover new themes in the corpus, in addition to the usual dependency on the training data. The requirement of seeing negative data during training (i.e., texts that do not refer to a given theme) makes even binary classification attempts (training one classifier for one theme) difficult, as the absence of a theme may be even harder to annotate than its presence. All this is also related to another challenge: Themes in literary texts are to some extent interpretative, in particular if the theme is deliberately hidden or its detection depends on special historical knowledge. Except for very broad and general themes, it is therefore unlikely that any kind of gold standard will be created in the near future.

The core idea in this procedure is to formulate a number of postulates that are derived from existing scholarly knowledge about dramatic texts. These postulates are assumed to be true in general without having to be true for every single text. Such postulates do need to be operationalized at some point, and structural properties of both the systems in comparison and the postulates make some more suited than others: Postulates that state relations between one variable and another can be operationalized as correlations in a straightforward manner, for instance. Once this is done, the different theme detection systems can be applied onto a corpus and their output be compared with the postulates. However, the interpretation of such results cannot be expected to be trivial, as we are comparing multiple systems with multiple postulates.

Conceptually, this is similar to two existing concepts from natural language processing (NLP): In the paradigm of distant supervision, an un-annotated corpus is annotated automatically, based on structured data that is in some way linked to the corpus. In Husby/Barbosa, training data is created by linking blog articles to Wikipedia domains via freebase entities,^{Footnote 10} which are in turn used as topic labels for the blog articles. After the data has been established, standard supervised classification algorithms can be applied. The second concept is extrinsic evaluation. NLP tools are often used in pipelines: The output of a part of speech tagger is fed as input to a dependency parser (the downstream task). If there was no gold standard for part of speech tags, one could evaluate a part of speech tagger by comparing the performance of a dependency parser with and without the input of the part of speech tagger (assuming we have an evaluation method for the dependency parser).

If formalized in an appropriate way, existing literary knowledge can actually be re-used as a ‘distant gold standard’. Systems could be optimized to reproduce decisions and distinctions made by literary scholars.

3 Theme Detection Systems

We will be using two kinds of systems in this article: Dictionary-based systems use a dictionary to detect words in text that belong to certain keywords in this dictionary. Dictionary-based systems require the existence or creation of a dictionary, which pre-determines the themes one is able to find in texts. While defined themes might still appear in unexpected places, it is impossible to detect completely new themes. The findings, however, can be interpreted in a straightforward way, as resulting numbers represent the (relative) amount of words belonging to a certain keyword in a text segment.

Latent Dirichlet Allocation (LDA, referred to here as „topic modeling“, TM) is an unsupervised statistical method that aims at uncovering textual patterns by inspecting co-occurrence of words, typically in large corpora. Using topic models, one can discover new and unexpected themes in a corpus. However, because documents in topic models are represented as distributions over distributions over words (see below), they are difficult to interpret. As other researchers have discovered,^{Footnote 11} many of the resulting topics do not represent themes at all, but formal or structural properties of the documents. In addition, one has to specify the number of topics in advance.

To give an overview, Table 1 shows the systems and system variants in comparison. Word classes shows the set of parts of speech that have been taken into account for the analysis. The dictionary-based systems only work with content words (nouns, verbs, adjectives) and ignore function words. Avg. size shows the average size (and standard deviation) for keywords (dictionary methods) or training documents (topic modeling). Total size gives the total number of entries (in the dictionaries) or number of training documents (for topic modeling).

Table 1 Overview of the theme detection systems. A detailed description of the columns can be found in the text

Full size table

3.1 Dictionary-based Approaches

A dictionary in this context is defined as a set of keywords with associated entries. The entries are semantically related to the keyword, but not necessarily synonymous. The keyword “theater”, for instance, would contain entries like “stage”, “actress”, “playing” etc. The dictionaries cover different word classes, historic, stylistic and spelling variations. They contain all entries as lemmas.

We will compare three different dictionary-based systems. The major difference between the systems lays in the nature and source of the entries. All systems assign a score for each keyword in the dictionary, based on the number of words that belong to this keyword. This score is normalized for the segment length, and, if word set sizes are differing largely, by dictionary size.

Custom Dictionary

In the first variant of this method, we use a dictionary that has been collected by a domain expert (one of the authors of this paper) for the specific purpose of plays. As described in Willand/Reiter,^{Footnote 12} we manually created a dictionary for each of the five keywords that we consider exceptionally relevant for the analysis of dramatic texts from our research period (1730 to 1930): “family”, “war”, “love”, “reason” and “religion”. Each of the sets contains 60 to 115 words, which sum up to 444 entries in total. Time-span related meaning variation has been taken into account. The creation of the sets was done in an iterative manner: Starting from the word “love” we looked for words in it’s immediate context with the character speech that were then checked for their contexts.^{Footnote 13}

An illustrative example (taken from ibid.) on the use of these dictionaries is shown in Fig. 1. Each corner in the ‘spider web’ represents an entry in the dictionary field and each node represents the extent to which a character uses words that are grouped under this keyword. As Fig. 1 shows, a custom dictionary is capable of giving information about the thematic distribution of character speech in the dimensions defined in the dictionary. Apparently, the lovers mostly talk about love and the parents mostly talk about family (with respect to the five thematic fields). The crucial point of this method is that one has to have an idea of relevant themes before doing any analysis. This allows domain experts to incorporate their previous knowledge but has the downside that it limits what we can find to what we have defined before.

Enriched Custom Dictionary

As second variant of the custom dictionary approach, we use the previously built custom dictionary as a set of seed words and add words that are distributionally similar. This similarity is measured as distance over word vectors created with the tool word2vec,^{Footnote 14} integrated as an R package wordVectors.^{Footnote 15} The word vectors have been trained on a corpus of German literary fiction.^{Footnote 16} Although a different literary genre (prose), we selected this corpus for its size and because it covers the historic period in question. The enriched dictionaries contain the same words as before, but additionally all words that have a (cosine) similarity of $0.5$ or higher. This results in between 136 and 212 entries per keyword. In total this dictionary contains 893 entries.

GermaNet Dictionary

As a third, technically similar option, we test the use of GermaNet 11^{Footnote 17} as a dictionary source. GermaNet is a concept network, primarily used for (computational) linguistic purposes (and developed by computational linguists). A concept in GermaNet may or may not be lexicalized: Systematic gaps in the hierarchy have been filled with concepts, even if no German word exists for this concept. The relations between concepts are lexical relations like hypernymy/hyponymy or various forms of meronymy. The hypernymy/hyponymy-relations can be used to form a concept tree based on inheritance. While there is a technical top node in this tree, the concepts directly below this node are grouped into so-called semantic fields (see GermaNet Section in the Appendix).

Due to the concept hierarchy, one could in principle operate on different granularity levels. As the other systems are using a limited (and rather small) number of dimensions, we chose to use the semantic field classification, and extracted all realized words that belong to these fields (from anywhere in the hierarchy). This results in a dictionary consisting of 54 keywords, each of which containing 85 to 18,070 lemmas. In total this dictionary contains 142,815 entries.

3.2 Topic Modeling

Topic modeling has been introduced by Blei et al.^{Footnote 18} The core idea is to model topics as distributions over words (i.e., a word has a certain probability to be in a topic), and to model documents as distributions over topics. A document is then a combination of various topics, each assigned a probability. When training a topic model, one needs to fix the number of topics the model should consider.

Within digital humanities, topic models are widely used as a means to discover thematic structure of text corpora. Underwood/Goldstone use topic models to study the history of the Publications of the Modern Language Association (a scientific journal);^{Footnote 19} Rhody studies figurative language in poetry,^{Footnote 20} Quamen/Hjartarson investigate the personal correspondence of two Canadian writers,^{Footnote 21} and Schöch studies dramatic genre (and sub-genre) using topic models^{Footnote 22} (this list is not exhaustive).

For our experiments, we employ topic models in two variants, each with a low and high number of topics (20 vs. 100 topics). In the first variant, we train the topic models on full text level: Each dramatic text is taken as a single document, containing the voices of all characters as one (stage directions and speaker names are removed). In the second variant, we consider the speech of every character as a new document. The entire speech of Emilia from Emilia Galotti, for instance, is put together in a new document, containing only her speech.

4 Postulates about Dramatic Texts

In this section, we substantiate our understanding of what we call ‘postulates about dramatic texts’, focusing on their motivation. Their operationalization will be discussed in Sect. 6.

4.1 Copresence

Following from the intuition that conversations on stage share a common theme (or set of themes), we expect that two copresent characters (which are characters with shared stage presence) share themes in their speech. This has never been subject to drama-historical postulates or poetological claims but seems plausible for our setting. In addition, it is in line with previous observations we made in former work on the translation of Shakespeare’s Romeo and Juliet:^{Footnote 23} Using the custom dictionary-approach to compare Shakespeare’s characters with those from German plays, we found that in dramatic genres with small ensembles—like the Bürgerliche Trauerspiel—the character speech of the most dominant characters is similar in terms of thematic distribution. That seems to be a result of ensemble size. The smaller it is, the greater the probability that characters spend time together on stage. We also found that copresence and themes of speech seems to be linked in single pairs of characters. Striking examples are in this case butlers and servants, which are often peripheral or satellite characters: Their only network connection is the one to their lord.^{Footnote 24} In the examples we give, this connection is usually not the only one, but it is the strongest. E.g., in Schillers Kabale und Liebe, the distribution of themes in the speech of Präsident von Walter and his butler is very similar. The same applies for Julieta and the nurse in Shakespeare’s Romeo and Juliet and even for Hettore Gonzaga and Marinelli in Emilia Galotti.

4.2 Scene Boundaries

The postulate about scene boundaries leads back to poetological controversies about the function of scenes that started with the classical unities of action, time and place. The latter was not introduced by Aristotle, but by J. C. Scaliger and adopted by the French doctrine classique in the 16th and 17th century. The neo-Aristotelian ideal of what Volker Klotz calls “geschlossenes Drama”^{Footnote 25} postulates a causal development of plot and the liaison des scènes.^{Footnote 26} This means that two adjacent scenes are connected by at least one character that remains present. This is supposed to ensure the unity of action. Gottsched—an assertor of the French tradition—describes it as follows:

Schlüßlich muß ich erinnern, daß die Auftritte oder Scenen in einer Handlung [meint “Akt”] allezeit mit einander verbunden seyn müssen: damit die Bühne nicht eher gantz ledig werde, bis die gantze Handlung aus ist. Es muß also aus der vorigen Scene immer eine Person da bleiben, wenn eine neue kommt, oder eine abgeht: damit die gantze Handlung einen Zusammenhang habe. Die Alten sowohl, als Corneille und Racine haben dieses fleißig beobachtet: aber in unsern deutschen Stücken geht es gemeiniglich sehr bunt durcheinander; und daher kommt es grosentheils, daß die Fabeln nicht recht zusammen hängen.

[Finally I must note that the scenes in an act must always be connected with each other: so that the stage does not become completely empty until the whole action is over. There must always be one person from the previous scene there when a new one comes, or one leaves: so that the whole act has a connection. The old ones as well as Corneille and Racine have observed this diligently: but in our German plays it is generally very colorfully mixed up; and that’s why it comes largely that the fables don't really hang together.]^{Footnote 27}

Even though Gottsched hints at the contemporarily widespread touring companies—that he wanted to be replaced by institutionalized theaters –, he also foresees a drama-historical development that would start two decades later. It basically breaks with the French doctrine classique and sets Shakespeare as the new poetological landmark. This development—which peaks in Sturm und Drang poetics—leads to our postulate about scene boundaries. It combines the structural unit ‘scene’ with the themes of character speech in a scene. If many characters remain on stage, the themes prevalent in the character speech remain similar, so that as Gottsched says, “die Fabeln […] zusammen hängen [the threads are connected].” In contrast, if there is a major exchange of characters between the scenes, the themes spoken in those scenes are likely to change as well.

4.3 Tragic or Happy Ends: Die or Marry

In her 2005 Novel Tenor of Love, the Canadian feminist poet, author and university professor Mary di Michele writes: “Somebody always has to die onstage, die or marry; that’s the only difference between a comedy and a tragedy as far as the world knows.”^{Footnote 28} In fact, there are a lot of differences between tragedies and comedies and most of them—just as the function of scenes—are object to manifold controversies for more than 2,000 years now. Some of them are plot-related and point to the divergent endings, other differences are bound to the dramatic characters. Their historicity, moral qualities, social status and decorum of speech are the best known. Among others Scaliger, Opitz, Gottsched and Lessing discussed the question, whether middle-class characters have tragic qualities or solely higher ranked characters as kings do.^{Footnote 29}

This postulate brings into focus the ending of the drama. The expectation here is that the sketched differences in events (death or marriage) are reflected in different themes appearing in the character speech. Commenting or describing someone’s wedding implies the appearance of wedding-related themes, while the death of a character can be expected to be discussed in different themes.

4.4 Dynamic and Static Protagonists

Characters in drama can be considered as either dynamic or static.^{Footnote 30} In antique tragedy, e.g., actors wore masks (persona) that expressed a fixed, stable character property, like a specific emotion. This mask represented the character’s set of values and the way they would deal with what fate intended for him.^{Footnote 31} A change of mind would neither correspond to the anthropological model of the Greek nor to their poetological understanding of dealing with the dramatic conflict. During the course of drama history, the idea of a static character got more and more replaced by the attempt to form characters in line with modern individuals. For German tragedy, this individualization of characters is usually stated to become a broader phenomenon with the rise of the Bürgerliche Trauerspiel in the second half of the 18th century:^{Footnote 32} Characters react to the dramatic conflict by changing their mind and values.

This postulate aims at the protagonist of the play and the idea that the thematic design of a dynamic character’s speech is more diverse than the speech of a static character. Since our corpus covers one century starting 1730, we expect it to contain mostly dynamic protagonists and aim for automatically determining their dynamicism.

5 Experimental Setup

5.1 Formalization

Each theme detection method produces a vector for a given input text. For the dictionary-based methods, this is the (normalized) number of words in the respective set. The methods using topic modeling return a vector of probabilities with one element for each topic. We make no assumptions on interpretability of the theme vectors, nor do we inspect the topics manually. Comparing system predictions on texts or text segments typically involves a notion of distance between the vectors. For all experiments, we employ Euclidean distance to measure distance between vectors and Pearson correlation^{Footnote 33} to measure correlations.^{Footnote 34}

A key component in the formalization of the postulates is the stage presence matrix.^{Footnote 35} This binary matrix represents scenes in columns and characters in rows. If a character has stage presence during a scene, the cell is marked with a 1 (one). All other cells contain 0 (zero). In our analysis, we equate ‘being present on stage’ with ‘speaking on stage’. This is a simplification, as mere presence on stage is not encoded in a machine-readable fashion in our corpus. Interestingly, this distinction can already to be found in Gryphius’s plays, e.g., in Carolus Stuardus, written about 1650: Not all characters present in a scene do actually speak, and so Gryphius differentiates in the Dramatis Personae of his plays between characters that enter stage ‘as speakers’ and those that enter ‘as mute’ characters.^{Footnote 36}

$$M = \left[ {\begin{array}{*{20}c} 1 & 0 & 1 \\ 1 & 1 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ \end{array} } \right]\;$$

(1)

To give an example, we assume the stage presence matrix shown in Eq. 1. Character 1 is therefore present on stage in scenes one and three. Individual columns or rows can be extracted: The vector for Scene 2 is $\left[ {0,1,1,0\left] { = M} \right[,2} \right]$ (second column), and the vector for character 3 is $\left[ {0,1,0\left] { = M} \right[3,} \right]$ (third row).

$$C_{i} = \{ x|M\left[ {x,i} \right] = 1\}$$

(2)

Based on this matrix, we can define the set of characters in a scene ($C_{i}$) as shown in Eq. 2: All characters that have a 1 in the respective column. $C$ will be the set of all characters in the play. We will use both the matrix and set representation for operationalizing the postulates.

5.2 Corpus

All texts are taken from the TextGrid repository^{Footnote 37} and have been written or translated into German between 1730 and 1930. We selected texts that can be classified (mostly) uncontroversially as either being a tragedy or a comedy. A table that lists the entire corpus can be found in the Appendix.

5.3 Preprocessing

The texts and their structure have been converted into stand-off annotations in the UIMA pipeline framework.^{Footnote 38} Stage directions and character speech have been processed separately, to ensure that inserted stage directions do not break the syntactic structure of the surrounding sentences. This has been achieved with the DramaNLP package.^{Footnote 39} The texts have been lemmatized and part of speech-tagged, using the Stanford part of speech-tagger^{Footnote 40} and the Mate lemmatizer^{Footnote 41}, integrated into the DKpro core package^{Footnote 42}. Speaker attributions have been made using manually specified mappings in difficult cases and string overlap heuristics in other cases.

After preprocessing, the texts have been exported into CSV files and are analyzed with RStudio, using the R package DramaAnalysis.^{Footnote 43}

6 Experiments

In this section, we will describe in detail the operationalization of the postulates introduced above and the results, separately for each postulate.

6.1 Copresence

Postulate: Copresence of characters leads to similar thematic profiles of their speech.

Operationalization

Given the scene presence matrix $M$ for a play, we first create a copresence matrix by multiplying the presence matrix with its transpose: $M_{c} = MM^{T}$. This (quadratic, symmetric) matrix contains the characters as rows and columns, and the number of scenes in which two characters are co-present as cell values. As a next step, this matrix is converted to a distance matrix by inverting and normalizing with the total number of scenes ($s$, equivalent to the number of columns in the stage presence matrix): $M_{dist} = 1 - \left( {M_{c} /s} \right)$.

This distance matrix contains values between 0 and 1. Pairs of characters that appear in more scenes together have lower values (lower distance) than characters that appear rarely together. Character pairs that are not co-present at all have the maximal distance of $1$, character pairs that are co-present in all scenes have a distance of $0$. Characters that speak less than 1,000 words are excluded from this analysis. As an example, Table 2 shows the distance matrix for Emilia Galotti. Emilia and Orsina do not share a single scene, which is marked by the maximal value in the corresponding cell. Emilia and Appiani share one single scene, which gives them a distance score of 0.98.

Table 2 Distance matrix for Emilia Galotti

Full size table

We calculate the correlation between distances based on co-presence and Euclidean distances between theme vectors. A positive correlation indicates that co-presence and shared topics in character speech go hand in hand, a negative correlation indicates the opposite. A value of zero indicates no correlation between the two. The correlation metric reflects how strong this postulate is realized in texts, and there is no way of formulating an exact expectation. High correlations are unrealistic to expect, as exceptions to the postulate come to mind relatively quick. Our discussion focuses on the range between 0.1 and 0.5.

Results

Figure 2 shows the percentage of dramas that fulfill the copresence postulate, for various correlation thresholds (on the x-axis). For instance, just over 75 % of the plays have a positive correlation (strictly $> 0$) when using the GermaNet dictionary method.

In general, we see similar curves for all methods, although the difference between the best and worst method is up to 10–15 percentage points (for low correlations). For correlations below $0.5$, dictionary methods achieve the better performances. For higher correlations, the custom dictionary method becomes one of the worst, while enriched and GermaNet are still achieving top performance rates. The four topic modeling variants are outperformed in most settings. It is a surprising result that methods that cover only a fixed set of themes (custom and enriched dictionary) do perform on a similar level as methods that are in principle covering all themes (GermaNet dictionary and topic modeling).

The fact that all methods achieve low correlation scores across the board suggests that this concept (of some copresence leading to similar themes) is hard to grasp for automatic methods. One reason is that characters are usually co-present with more than one other character throughout the play—the entire text a character speaks is actually a mixture of contributions to thematically more or less connected situations (scenes), in which the content is also influenced by other present characters. Another difficulty is that copresence is not limited to two present characters—it’s not unusual that three or more characters interact on stage simultaneously. This polyphony of character speeches influences the speech of each character. None of the methods is designed to cope with these problems.

6.2 Scene boundaries

Postulate: The more characters are exchanged at scene boundaries, the more different the two scenes are thematically.

Operationalization

To our knowledge, two possible ways of quantifying the change of stage personnel have been published: Marcus proposes the use of the Hamming distance on a binary stage presence matrix.^{Footnote 44} Given two symbol sequences of equal length, the Hamming distance is the number of positions in which the two sequences are different. Applied to two scene vectors extracted from a scene presence matrix, this is the number of characters that leave or enter the stage at a scene boundary.

Hamming distance can be scaled to the interval between zero and one in two ways: Dividing by the total number of characters ($\left| C \right|$) allows comparison across dramatic texts, as a play with a large dramatis personae can exchange more characters than one with a small cast (Eq. 3). Trilcke et al. propose the drama change rate as a new measure to quantify this change.^{Footnote 45} Drama change rate is almost equivalent to the Hamming distance, but it uses the number of characters in the two adjacent scenes to scale (Eq. 4), not the entire cast. In all variants, the measure is minimal if all characters remain on stage. Scaled Hamming distance is only maximal if all characters in the play are affected, e.g., by one half leaving and the other half entering the stage. Drama change rate is maximal if all characters that are present in one of the two scenes are exchanged, irrespective of their proportion at the full dramatis personae.

$$H_{i} = \frac{{\left| {C_{i} \vartriangle C_{i + 1} } \right|}}{\left| C \right|}$$

(3)

$$r_{i} = \frac{{\left| {C_{i} \vartriangle C_{i + 1} } \right|}}{{\left| {C_{i} \cup C_{i + 1} } \right|}}$$

(4)

Draxler describes the scenic difference (szenische Differenz) between two neighboring scenes as the total number of characters in the drama minus the number of characters present in both scenes (Eq. 5).^{Footnote 46} The resulting number is maximal (i.e., equal to the number of characters in the play), if not a single character is present in both scenes (the intersection becomes zero). It is minimal if all characters in the entire play remain on stage.^{Footnote 47} The scenic difference can be scaled by dividing by the total number of characters in the play.

$$s_{i} = \left| C \right| - \left| {C_{i} \cap C_{i + 1} } \right|$$

(5)

The difference between the measures can first be characterized as one of scope: Scaled Hamming distance directly expresses how many percent of the dramatis personae are exchanged at this boundary. Drama change rate is a local measure, ignoring whether the change concerns 90 or 10 percent of the dramatis personae. Scenic difference gives a number that is relative to the total number of characters in the dramatis personae. Only if all characters in the play are present in either one of the scenes ($C = C_{1} \cup C_{2}$) or the number of remaining characters is zero ($C_{1} \cap C_{2} = \emptyset$), the measures produce the same number. The difference between the measures grows with the number of characters in the dramatis personae, and with the stability of the stage personnel.

It is noteworthy that Draxler’s scenic difference sometimes produces counter-intuitive results: Consider two scene boundaries as an example. A single character is entering the stage at the first, while a group of characters is entering at the second. The intersection of characters across the boundaries will remain the same at both boundaries, leading to the same scenic difference—although the group entering at the second boundary might be as large as the entire set of characters already on stage.

Figure 3 shows plotted Hamming distance/change rate/scenic difference (dashed lines) and thematic distance values (solid line) for the Lessing play Emilia Galotti and the Schiller play Jungfrau von Orleans. The vertical lines depict act boundaries. Broadly speaking, the measures make similar movements at most scene boundaries. Act boundaries show higher rates. Scenic difference and Hamming distance values are generally in a smaller interval, as they express change rates relative to the entire set of characters.

The difference between Hamming distance and drama change rate (that results from different scaling methods) becomes apparent between scenes 17–20 in Die Jungfrau von Orleans (bottom plot). Drama change rate shows a continuous maximal line (as all characters are exchanged at the boundaries), while the scaled Hamming distance differentiates between 15 and 20 % of the characters being exchanged.

According to the postulate, we expect a correlation between the thematic distance (solid line) and one of the change rates (dashed lines). We do not expect a perfect correlation (this becomes obvious in the examples). In some cases, the same set of themes is continued across scene boundaries (at least according to this theme detection method) even if characters are exchanged (e.g., across the first act boundary in Emilia Galotti). In some parts, however, we see a similar movement (e.g., the peak in the last act of the Jungfrau von Orleans).

Technically, we calculate Pearson correlations between one of the measures at each scene boundary, and the Euclidean distance between the scene before and after the boundary. Scene boundaries that are also act boundaries are not removed, as we expect them to behave in line with the postulate.

Results

Figure 4 presents the results in a similar way as for the previous postulate. Again, the x-axis shows the correlation threshold and the y-axis the number of texts for which we find a positively correlation according to the postulate. The left plot shows the positive correlations with the drama change rate, the right plot the positive correlations with scenic difference.

The most striking observation here is that the results are better differentiated when correlated with the scenic difference. In terms of method ranking, however, the results are similar: The custom dictionary method clearly performs worst, with scores over 10 pp. behind the system ranked second last. To a great extent, this is probably related to the limited size of only five discriminatory topics. That they might be relevant for a lot of plays does not implicate, that they are the only themes discussed in plays. Themes like death or nature are simply not recognised by the custom dictionaries. Enriched dictionary performs a bit better, but is still outperformed by the others, at least up to a correlation coefficient of 0.4. The different variants of topic modeling perform all similar—depending on the correlation level, one or the other variant is slightly in the lead. A varying number of dimensions does not make a big difference, although topic models with more dimensions are slightly in the lead.

The GermaNet dictionary method performs best if correlated against the scenic difference. One possible explanation could be that the ‘counter-intuitive’ scores this measure sometimes produces (see above) are actually in line with the postulate. Characters entering the stage might not make a big difference in terms of discussed themes.

A further possible reason for generally low correlation scores might be the diverging understandings of the function of scenes in drama history. While boundaries between acts in almost all plays mark a change of time or place, boundaries between scenes are known to be ambiguous.^{Footnote 48} That is mainly because the sketched French and German post-Gottsched tradition in poetics and playwriting differ in terms and concepts. In the French tradition coming from Scaliger, a play is structured by scène and acte. The German tradition since 1750—which leads back to Shakespeare—distinguishes Auftritt, Szene and Akt.^{Footnote 49} The French acte corresponds to the German Szene and Akt by the mentioned time/place-change.^{Footnote 50} In our understanding, a scene is not necessarily a plot-related category. It is less likely to be plot-related in the French scène, where entries (Auftritte) or exits of only one character cause scene boundaries. But it is more likely to be plot-related in Shakespeare’s tradition, where scene boundaries demark a change of the complete setting and the new location might imply other themes, even though the same characters are on stage.

6.3 Tragic or happy ends: Die or marry

Postulate: Tragedies and comedies are distinguishable based on the theme of the ending.

Operationalization

This postulate focuses on the ending. We therefore extract the ending from each text by considering only the last act, the last scene, or the last two scenes (determining the actual ending of a literary text is surprisingly difficult^{Footnote 51}). We consider the entire text in this ending, and disregard individual characters. Using the different methods, we generate theme vectors for each ending, and cluster the resulting vectors hierarchically, using group-average linkage. We analyze the resulting cluster purity at multiple stages in the hierarchical clustering process, which results in differing number of clusters.

Figure 5 shows a hierarchical clustering of several plays as an example, based on the last act. Tragedies are marked with a T, comedies are marked with a C. In the bottom part, three of the four tragedies are grouped together nicely. In the top part, a comedy and a tragedy have been judged as similar. The vertical lines in the plot indicate possible stages to extract the actual groupings. Extracting the clusters at the short-dashed line yields two clusters, extracting at the dotted line yields three and the long-dashed line four clusters.

$$P\left( {C_{i} } \right) = \frac{{\mathop {{\text{max}}}\limits_{j} p_{i}^{j} }}{{\left| {C_{i} } \right|}}$$

(6)

Cluster purity is calculated as shown in Eq. 6, with $C_{i}$ being a cluster, and the numerator the number of instances of the majority class in the cluster. In essence, cluster purity is the percentage the majority class in each cluster has. If one would use the clustering to assign class labels to each instance, cluster purity expresses the classification accuracy. Since our corpus contains more tragedies than comedies, a majority baseline would achieve a classification accuracy of $54.72\%$ (which is equivalent with the cluster purity when putting all endings in a single cluster). In the example in Fig. 5, the clusters at the short-dashed line (left) have an average cluster purity of $0.75$, the two other stages an average purity of $1$.

The reasoning behind looking at multiple numbers of clusters is that both tragedy and comedy are broad, generic genres that combine many sub genres. A system might be better at distinguishing more fine-grained classes (as the one in Fig. 5), which would still be indicative of its performance for theme detection.

Results

Figure 6 shows the results for the different systems and ending variants. In each plot, we see the number of clusters on the x-axis, and the achieved cluster purity on the y-axis.

When considering the last act as the ending, two groups of systems can be discerned: The systems based on topic models, and the dictionary-based systems. The former outperform the latter clearly. For the distinction between the genres, the low-dimensional topic model (TM) trained on entire texts performs best. A jump in performance can be observed when distinguishing between six genres: The topic models trained on entire texts (both variants) achieve a cluster purity of about 90 %. Increasing the number of groups further has only minor influence on this score.

The picture changes slightly when considering only the last scene as the ending. The low dimensional TM trained on entire texts achieves high purity early on. Interestingly, the higher dimensional TM achieves a similar performance only for five and more genres. Character-based topic modeling is generally lower, but in a similar range.

If we look at the last two scenes understood as the ending, we observe different results. The gap between the best and the second-best system has been substantial (over ten percentage points) in the previously discussed plots. This gap does not exist in this setting. The topic models perform similarly as the dictionary-based systems. Only for more than five to seven genres, a gap between topic modeling and dictionary-based methods can be discerned.

One conclusion to draw here is that the ‘ending’ of a dramatic text is better represented by either the last scene or last act, but not multiple scenes. A possible explanation could be that the second last scene leads to more heterogeneous themes or introduces ‘non-ending material’. More material might counterbalance this effect, leading again to increased performance when considering the entire last act. But this might not be related to the dramatic ending in particular. The additional text could be distinctive for the dramatic genre without any thematic connection to the prototypical drama endings. In terms of system choice, it is clear that topic modeling performs much better than the other methods.

6.4 Dynamic and static protagonists

Postulate: Themes in the main characters speech are changing over the course of the play.

Operationalization

To test against this postulate, we focus on the protagonist of the play. Moretti points out that identifying the actual protagonist is far from trivial—no uniform definition exist, and more complex definitions are difficult to operationalize.^{Footnote 52} We therefore opt for a very simple heuristic, and select the most ‘talkative’ character, i.e., the character that speaks the most words per play.

In the three plays depicted in Fig. 7, Sara and Johanna are clearly the protagonists in Lessing’s Miß Sara Sampson and Schiller’s Die Jungfrau von Orleans—and in both plays, they are the characters with the most spoken words. There are, however, cases in which human readers do not identify the most talkative character as the main one. In Lessing’s Emilia Galotti, the character speaking the most is Marinelli, chamberlain of the prince. The prince himself, Emilia’s father, or Emilia, who would all be candidates for protagonists, speak, however, less then Marinelli. In line with our attempt of working with the vagueness engrained in literary works, we rely on the fact that this heuristic works in many other cases and identifies—if not the antagonist or at least one of the group of antagonists—a very present and therefore important character.

Given the main character in this fashion, we extract this character’s speech from the first, second, fourth and fifth act. As the key turning point in the character development is supposed to take place during Act 3, we leave out this act. Plays that do not contain five acts are excluded from this analysis.

The segmented character speech is then processed by the theme detection methods, which results in a theme vector for each act (for each main character). We then calculate Euclidean distance between the vectors and compare distances before/after and across the change. A distance before the change would, e.g., be the distance between first and second act, a distance across the change one between the first and fourth act. The expectation here is that for dynamic characters, the distance across the third act should be higher than before or after the third act.

Results

The results of this experiment are shown in Fig. 8. Each plot shows the (normalized) distance values between one act pair from within the first or second half (x-axis) and one act pair across the third act (y-axis). Each symbol represents one drama and its act pair distances. The straight lines are linear trend lines, set in such a way that the gap between the line and the corresponding points is minimal.

A system that is able to fully represent the postulate would be shown as many points appearing in the top-left triangle, to the left or top of the diagonal. This would be caused by distances across the third act being larger than distances before/after the third act. The corresponding trend line would be relatively steep and cross the plot boundary at the top.

It is relatively clear that this is in general not the case—the only systems with (relatively) steep trend lines are the GermaNet-based dictionary system and the high-dimensional character-based TM when comparing distance between act pairs 1, 4 and 4, 5 (bottom left plot). All other systems in all other settings produce gentle trend lines. In general, the methods are seemingly not able to grasp the character development appropriately.

We see two possible reasons for this bad performance: The postulated dynamics of characters might not be present on this granularity level. Developments within acts remain hidden in our analysis but could be perceived much stronger by audience or readers. This might be the case, if a character’s change of mental attitude is expressed in opposing behavior: If an ‘I will not do this’ in the exposition becomes an ‘I will do this’ in the final act, theme identification methods don’t stand a chance. Secondly, most of the poetological claims involving character development focus on tragedies, and it is unclear whether this postulate is supposed to hold on comedies.

7 Conclusions

This article systematically explores different ways of measuring the themes that appear in character speech in dramatic texts. The focus of this article lays on the methodology and the development of such methodology, and not on the interpretation or use of results coming from such methods. Development and use of methodology (or tools in this case) are two distinct ways of doing computational literary studies, and it is important not to confuse the two. In order to be able to judge the quality of any method or tool, one has to have some kind of gold standard against which results are compared—even if the gold standard is not realized in a corpus or annotated data set. And if one is using a certain methodology or tool, one needs to be sure that the tool is actually doing what it is supposed to. Evaluating a tool and interpreting its results at the same time is dangerous, as it may lead to results that are not generalizable. As we have presented here, we believe it is possible to evaluate systems based on a not explicitly annotated “gold” standard.

We have compared different methods that assign themes to segments of dramatic character speech. It may not be surprising that there is no clear winner: The notion of theme is sometimes interpretative, the postulates we defined challenging, the methods not adapted to dramatic texts. We can, however, discern some strengths and weaknesses of the methods:

Topic modeling can be applied to short text segments without severe loss of performance. Topic models that are trained on individual character’s speech perform in general similarly as models trained on entire texts, sometimes even better. We see no general difference in the scene boundary and ending postulates. Only for the copresence postulate, character-based topic models achieve less performance than models trained on the full text: This is surprising, given the fact that the text segments under investigation here are character speeches.
A key difference between the GermaNet dictionary and the other dictionary methods is that the others cover only a fixed set of themes, while GermaNet is much larger and covers (in theory) most of the (modern) German language use. This difference does not lead to huge performance differences, GermaNet dictionary performs similar to the others in the all postulates. The only exception to this trend is the higher performance when correlated with scenic difference in the scene boundary postulate, but this might be more related to scenic difference than to the dictionary.
Higher dimensionality of the topic models does not lead to increased performance automatically. The only case in which the higher dimensional TM performs better than its counterpart is for the third postulate on the last two scenes. In this case, the character-based TM performs better with 100 dimensions than with 20. In all other cases, the high dimensional TM is either outperformed or performs very similar to its counterpart.

The second main motivation for this paper was to experiment with a new kind of gold standard. The postulates we defined are thought to be true in general, which does not mean that they are true for every single text. Since many literary statements are of this kind (if made about a set of texts like a genre), a larger number of postulates can be established, which makes the entire procedure even more fruitful. Calculating correlation scores between postulate and predicted result then incorporates the literary vagueness into the evaluation and does not hide it.

However, there are still multiple reasons to be critical about the postulates we discussed in this article: There might be disagreement on their truth, i.e., one might doubt that a postulate holds true even in general. Apart from our observations in previous work and our intuition, there is little evidence for the first postulate. The role of scene boundaries has been discussed in literary studies (cf. Sect. 4, scene boundaries), but there is no guarantee that playwrights adhere to what normative drama poetics tell them to. The fact that a genre ‘tragicomedy’ exists points out that the comedy/tragedy distinction is not as clear cut as it seems to be. While this might sound like the discussed postulates are not trustworthy, we argue that it is not necessary to believe in them fully. Inspecting different correlation levels, for instance, allows comparison to multiple, and potentially subjective, possible truths.

Canonization in literary studies might be another reason for reservations against using postulates as a gold standard. Many theories and theoretically justified predictions are based on a relatively small and non-representative selection, the literary canon. If one applies a postulate developed on the canon to non-canonical texts, it is indeed questionable whether it can be expected to be true, or even ‘truish’. However, a closer look is necessary: Non-canonical texts might not be different from canonical texts in all aspects. As we have discussed in Reiter/Willand, popular plays, which are mostly not canonized, diverge from canonical plays in many structural aspects.^{Footnote 53} The corpus used in this work, however, consists mostly of canonical plays.

A second possible critique towards the postulates is that they might not be related to themes. We initially believed that the comparisons discussed above include a thematic change, at least to some extent. The results partly suggest that this is might not always be the case; surprisingly, the analysis of character development is such a case. While the exact extent is an open question to all our method testing scenarios, we can still not decide whether the postulate or the operationalization by theme is the crux of the matter. As an outlook for the future, it would be helpful to apply our methods to synthetic texts. These could—to stay with our example—contain thematically ideal dynamic and static characters and thus contribute to the answer to our questions.

Finally, all postulates need to be operationalized before conducting any kind of experiment. Operationalizing a concept always entails decisions that may lead to a misrepresentation of the concept. Some choices might not even be foreseen conceptually, as is best exemplified by the different quantification methods for personnel change for the scene boundary postulate. We believe that the above described operationalizations are reasonable, without saying that they are without alternatives. However, we also believe a scientific discussion about appropriate ways of operationalizing literary concepts is very fruitful, and that experiments as the ones above bring to light weaknesses and strengths of such concepts. The issue of operationalization is also discussed in Reiter/Willand,^{Footnote 54} since we believe that progress on operationalizing literary concepts is crucial for tool and method development.

In sum, this article makes an argument for a systematic methodological development. The lack of annotated reference corpora is a challenge but can be circumvented by thoughtful and reflected use of insights derived in literary studies. Directly comparing methods to solve a fixed set of problems reveals that there is not a single winner-method: Depending on assumptions about drama structure and the exact goal, different methods need to be considered.

Notes

1.
Marcus Willand, “Hermeneutische Interpretation und digitale Analyse. Eine Verhältnisbestimmung,” in: Luisa Banki (ed.), Lektüren. Positionen Zeitgenössischer Philologie, Trier 2017, 77–100.
2.
Cf. Nils Reiter/Evelyn Gius/Jannik Strötgen et al., “A Shared Task for a Shared Goal – Systematic Annotation of Literary Texts,” in: Digital Humanities 2017: Conference Abstracts, Montreal 2017, https://dh2017.adho.org/abstracts/192/192.pdf (retrieved on June 10, 2018).
3.
Thomas Bögel/Michael Gertz/Evelyn Gius et al., “Collaborative Text Annotation Meets Machine Learning: heureCLÉA, a Digital Heuristic of Narrative,” in: DHCommons 1 (July 2015), https://doi.org/10.5281/zenodo.3240591.
; Bernhard Fisseni/Aadil Kurji/Benedikt Löwe, “Annotating with Propp’s Morphology of the Folktale: Reproducibility and Trainability,” in: Literary and Linguistic Computing 29/4 (2014), 488–510, 10.1093/llc/fqu050 (both retrieved on June 10, 2018).
4.
Franco Moretti, “Network Theory, Plot Analysis,” in: Pamphlets of the Stanford Literary Lab 2 (2011), https://litlab.stanford.edu/LiteraryLabPamphlet2.pdf; Peer Trilcke/Frank Fischer/Dario Kampkaspar, “Digital Network Analysis of Dramatic Texts,” in: DH2015 Conference Abstracts, Sydney 2015, https://dlina.github.io/presentations/2015-sydney/sydney.html#/ (both retrieved on June 10, 2018).
5.
Mark Algee-Hewitt, “Distributed Character: Quantitative Models of the English Stage, 1500–1920,” in: Book of abstracts (2017), 119–121, https://dh2017.adho.org/abstracts/DH2017-abstracts.pdf (retrieved on June 10, 2018).
6.
Christof Schöch, “Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama,” in: Digital Humanities Quarterly 2 (November 2017), 10.5281/zenodo.166356.
7.
Eric T.Nalisnick/Henry S. Baird, “Character-to-Character Sentiment Analysis in Shakespeare’s Plays,” in: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Vol. 2: Short Papers, Sofia 2013, 479–483, http://www.aclweb.org/anthology/P13-2085 (retrieved on June 10, 2018).
8.
Joseph Bullard/Cecilia Ovesdotter Alm, “Computational Analysis to Explore Authors’ Depiction of Characters,” in: Proceedings of the 3rd Workshop on Computational Linguistics for Literature (2014), 11–16, http://www.aclweb.org/anthology/W14-0902 (retrieved on June 10, 2018).
9.
Folgert Karsdorp/Mike Kestemont/Christof Schöch et al., “The Love Equation: Computational Modeling of Romantic Relationships in French Classical Drama,” in: Mark A. Finlayson/Antonio Lieto/Ben Miller (eds.), 6th Workshop on Computational Models of Narrative 45 (2015), 98–107, 10.4230/OASIcs.CMN.2015.98.
10.
Stephanie Husby/Denilson Barbosa, “Topic Classification of Blog Posts Using Distant Supervision,” in: Proceedings of the Workshop on Semantic Analysis in Social Media 2012, 28–36, http://www.aclweb.org/anthology/W12-0604 (retrieved on June 10, 2018).
11.
E.g., Benjamin Schmidt, “Words Alone: Dismantling Topic Models in the Humanities,” in: Journal of Digital Humanities 2/1 (2012), http://journalofdigitalhumanities.org/2-1/words-alone-by-benjamin-m-schmidt/ (retrieved on June 10, 2018).
12.
Marcus Willand/Nils Reiter, “Geschlecht und Gattung. Digitale Analysen von Kleists Familie Schroffenstein,” in: Kleist-Jahrbuch (2017), 177–195, p. 184.
13.
The full dictionary is available on GitHub: https://github.com/quadrama/metadata/tree/b5ab13/fields (retrieved on June 10, 2018).
14.
Tomas Mikolov/Ilya Sutskever/Kai Chen et al., “Distributed Representations of Words and Phrases and their Compositionality,” in: Advances in Neural Information Processing Systems 26 (2013), 3111–3119.
15.
Benjamin Schmidt, WordVectors (2017), https://github.com/bmschmidt/wordVectors (retrieved on June 10, 2018).
16.
143M tokens in 2,737 texts; Frank Fischer/Jannik Strötgen, “When Does German Literature Take Place? – On the Analysis of Temporal Expressions in Large Corpora,” in: Proceedings of the Annual Conference of the Alliance of Digital Humanities Organizations (2015), https://people.mpi-inf.mpg.de/~jstroetge/papers/2015-DH-FischerStroetgen-WhenDoesGermanLiteratureTakePlace.pdf (retrieved on June 10, 2018).
17.
Birgit Hamp/Helmut Feldweg, “GermaNet-a Lexical-Semantic Net for German,” in: Proceedings of the Workshop on Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications on the Conference of the Association of Computational Linguistics 1997, https://www.aclweb.org/anthology/W97-0802 (retrieved on June 10, 2018).
18.
David Blei/Andrew Y. Ng/Michael I. Jordan, “Latent Dirichlet Allocation,” in: Journal of Machine Learning Research 3 (2003), 993–1022.
19.
Ted Underwood/Andrew Goldstone, “What can topic models of PMLA teach us about the history of literary scholarship?” in: Journal of Digital Humanities 2/1 (2014), http://journalofdigitalhumanities.org/2-1/what-can-topic-models-of-pmla-teach-us-by-ted-underwood-and-andrew-goldstone/ (retrieved on June 10, 2018).
20.
Lisa M. Rhody, “Topic Modeling and Figurative Language,” in: Journal of Digital Humanities 12/1 (2012), 19–38.
21.
Harvey Quamen/Paul Hjartarson, “Big Data and the Literary Archive: Topic Modeling the Watson-McLuhan Correspondence,” in: Proceedings of Dh (2014), http://dharchive.org/paper/DH2014/Poster-94.xml (retrieved on June 10, 2018).
22.
Schöch (Ann. 6).
23.
Willand/Reiter (Ann. 12).
24.
Franco Moretti, Distant Reading, London 2013, 227.
25.
Volker Klotz, Geschlossene und offene Form im Drama, München 1960.
26.
Scherer dates the liaison des scènes as a common rule back to 1640ies France. Cf. Jacques Scherer, La dramaturgie classique en France, Paris 1950. The rise and fall of the interrelations between ceremonial and dramatic principles of court representation describes Juliane Vogel, “Aus dem Takt: Auftrittsstrukturen in Schillers Don Carlos,” in: Deutsche Vierteljahrsschrift für Literaturwissenschaft und Geistesgeschichte 86/4 (2012), 532–546.
27.
Johann C. Gottsched, Versuch einer Critischen Dichtkunst vor die Deutschen, Leipzig 1730, 585. Transl. by the authors, NR/MW.
28.
Mary Di Michele, Tenor of Love, New York 2005, 20.
29.
Cf. for a brief sketch of the so called estates-clause Harald Weinrich, “Hoch und Niedrig in der Literatur,” in: Horst Rüdiger (ed.), Literatur und Dichtung, Stuttgart u. a. 1973, 160–170.
30.
Manfred Pfister, The Theory and Analysis of Drama, Cambridge, UK, New York 1988, 241–243.
31.
An outstanding description of drama-historical changes and their dependencies from the social and religious values of a time can be found in Hebbels poetological foreword to “Maria Magdalena, betreffend das Verhältnis der dramatischen Kunst zur Zeit und verwandte Punkte,” cf. Friedrich Hebbel, “Vorwort zur’Maria Magdalena’, betreffend das Verhältnis der dramatischen Kunst zur Zeit und verwandte Punkte,” in: Sämmtliche Werke, ed. Richard M. Werner, Berlin 1911, 39–65.
32.
Cf. Bernhard Asmuth, Einführung in die Dramenanalyse, Stuttgart ⁵1997.
33.
Karl Pearson, “Notes on Regression and Inheritance in the Case of Two Parents,” in: Proceedings of the Royal Society of London 58 (1895), 240–242.
34.
Ibid. As we compare different results, it is important to keep this setting constant over all our experiments. Other choices have not been explored.
35.
This is of course related to the configuration matrix (cf. Pfister [Ann. 30]), but based on scenes instead of configurations.
36.
Cf. Albrecht C. Rotth (Vollständige Deutsche Poesie: In drey Theilen, Leipzig 1688, 141), who postulates: “Die Zahl der Scenen in welche man die Actus wiederum eintheilet/stehen dem Poeten frey. Doch nimmt man ordentlicher Weise über 5. oder 6. derselben nicht gerne/führt auch über 4. Personen redend nicht ein/Damit sie den Zuschauer nicht verwirren; Stumme Personen aber mögen seyn so viel ihr woollen.”
37.
https://textgridrep.org (retrieved on June 10, 2018).
38.
http://uima.apache.org (retrieved on June 10, 2018).
39.
Nils Reiter, DramaNLP (2017), 10.5281/zenodo.400627.
40.
Kristina Toutanova/Dan Klein/Christopher Manning et al., “Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network,” in: Proceedings of HLT-NAACL (2003), 252–259.
41.
Anders Björkelund/Bernd Bohnet/Love Hafdell et al., “A High-Performance Syntactic and Semantic Dependency Parser,” in: Coling 2010: Demonstrations, 33–36, http://www.aclweb.org/anthology/C10-3009 (retrieved on June 10, 2018).
42.
Richard Eckart de Castilho/Iryna Gurevych, “A broad-coverage collection of portable NLP components for building shareable analysis pipelines,” in: Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for Hlt, Dublin (2014), 1–11, http://www.aclweb.org/anthology/W14-5201 (retrieved on June 10, 2018).
43.
Version 2, Nils Reiter, DramaAnalysis, 2017, 10.5281/zenodo.847167.
44.
Solomon Marcus, Mathematische Poetik, Bucuresti 1973; Richard W. Hamming, “Error detecting and error correcting codes,” in: The Bell System Technical Journal 29/2 (1950), 147–160, 10.1002/j.1538-7305.1950.tb00463.x (retrieved on June 10, 2018).
45.
Peer Trilcke/Frank Fischer/Mathias Göbel et al., “Netzwerkdynamik und Plotanalyse. Zur Visualisierung und Berechnung der ‘progressiven Strukturierung’ literarischer Texte,” in: Book of Abstracts of Dhd 2017, Bern 2017, 175–180.
46.
Christoph Draxler, “Computerunterstützte Dramenanalyse,” Master’s thesis, LMU Munich 1988.
47.
Ibid. correctly notes that this case cannot occur. His analysis is based on configurations, and configuration boundaries are defined by the fact that a character enters or leaves the stage. Our analysis is based on scene boundaries, and it could well be the case that no personnel change takes place.
48.
Cf. Bernhard Asmuth, “Charakter,” in: Klaus Weimar/Harald Fricke/Klaus Grubmüller et al. (eds.), Reallexikon der deutschen Literaturwissenschaft, Berlin 1997, 297–299, here: 37–40.
49.
Pfister (Ann. 30), 238.
50.
The English term ‘scene’ is used as diverse as the German or French; cf. Bruce Smith, “Scene,” in: Henry S. Turner (ed.), Early modern theatricality, Oxford 2013, 93–112.
51.
Cf. Fotis Jannidis/Isabella Reger/Albin Zehe et al., “Analyzing Features for the Detection of Happy Endings in German Novels,” in: DHd2017: Book of Abstracts, Bern 2017, 81–85, arXiv:1611.09028 [cs.IR] (November 28, 2016), https://arxiv.org/abs/1611.09028, http://www.dhd2017.ch/wp-content/uploads/2017/03/Abstractband_def3_M%C3%A4rz.pdf (retrieved on June 10, 2018).
52.
Franco Moretti, “‘Operationalizing’: or, the function of measurement in modern literary theory,” in: Pamphlets of the Stanford Literary Lab 6 (2013), https://litlab.stanford.edu/LiteraryLabPamphlet6.pdf (retrieved on June 10, 2018).
53.
Nils Reiter/Marcus Willand, “Surveying Shakespeare’s Impact on the German Drama: Taking a Computational Approach to an Epoch,” in: Sandro Jung and Michael Wood (eds.), Anglo-German Dramatic an Poetic Culutures: New Perspectives on Exchange in the Sattelzeit. Bethlehem, PA, pp- 117–143 (to appear 2019).
54.
Ibid.
55.
https://quadrama.github.io/(retrieved on June 10, 2018).

References

All digital references were last viewed on June 10, 2018.
Google Scholar
Algee-Hewitt, Mark, “Distributed Character: Quantitative Models of the English Stage, 1500–1920,” in: Book of abstracts (2017), 119–121, https://dh2017.adho.org/abstracts/DH2017-abstracts.pdf.
Asmuth, Bernhard, “Charakter,” in: Klaus Weimar/Harald Fricke/Klaus Grubmüller et al. (eds.), Reallexikon der deutschen Literaturwissenschaft, Berlin 1997, 297–299.
Google Scholar
Asmuth, Bernhard, Einführung in die Dramenanalyse, Stuttgart ⁵1997.
Google Scholar
Björkelund, Anders/Bohnet, Bernd/Hafdell, Love et al., “A High-Performance Syntactic and Semantic Dependency Parser,” in: Coling 2010: Demonstrations, 33–36, http://www.aclweb.org/anthology/C10-3009.
Blei, David/Ng, Andrew Y./Jordan, Michael I., “Latent Dirichlet Allocation,” in: Journal of Machine Learning Research 3 (2003), 993–1022.
Google Scholar
Bögel, Thomas/Gertz, Michael/Gius, Evelyn et al., “Collaborative Text Annotation Meets Machine Learning: heureCLÉA, a Digital Heuristic of Narrative,” in: DHCommons 1 (July 2015), DOI https://doi.org/10.5281/zenodo.3240591.
Bullard, Joseph/Ovesdotter Alm, Cecilia, “Computational Analysis to Explore Authors’ Depiction of Characters,” in: Proceedings of the 3rd Workshop on Computational Linguistics for Literature (2014), 11–16, http://www.aclweb.org/anthology/W14-0902.
Draxler, Christoph, “Computerunterstützte Dramenanalyse,” Master’s thesis, LMU Munich 1988.
Google Scholar
Eckart de Castilho, Richard/Gurevych, Iryna, “A broad-coverage collection of portable NLP components for building shareable analysis pipelines,” in: Proceedings of the Workshop on Open Infrastructures and Analysis Frameworks for Hlt, Dublin (2014), 1–11, http://www.aclweb.org/anthology/W14-5201.
Fischer, Frank/Strötgen, Jannik, “When Does German Literature Take Place?—On the Analysis of Temporal Expressions in Large Corpora,” in: Proceedings of the Annual Conference of the Alliance of Digital Humanities Organizations (2015), https://people.mpi-inf.mpg.de/~jstroetge/papers/2015-DH-FischerStroetgen-WhenDoesGermanLiteratureTakePlace.pdf.
Fisseni, Bernhard/Kurji, Aadil/Löwe, Benedikt, “Annotating with Propp’s Morphology of the Folktale: Reproducibility and Trainability,” in: Literary and Linguistic Computing 29/4 (2014), 488–510, DOI https://doi.org/10.1093/llc/fqu050.
Article Google Scholar
Gottsched, Johann C., Versuch einer Critischen Dichtkunst vor die Deutschen, Leipzig 1730.
Google Scholar
Hamming, Richard W., “Error detecting and error correcting codes,” in: The Bell System Technical Journal 29/2 (1950), 147–160, DOI https://doi.org/10.1002/j.1538-7305.1950.tb00463.x.
Article Google Scholar
Hamp, Birgit/Feldweg, Helmut, “GermaNet-a Lexical-Semantic Net for German,” in: Proceedings of the Workshop on Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications on the Conference of the Association of Computational Linguistics 1997, https://www.aclweb.org/anthology/W97-0802.
Hebbel, Friedrich, “Vorwort zur ’Maria Magdalena’, betreffend das Verhältnis der dramatischen Kunst zur Zeit und verwandte Punkte,” in: Sämmtliche Werke, ed. Richard M. Werner, Berlin 1911, 39–65.
Google Scholar
Husby, Stephanie/Barbosa, Denilson, “Topic Classification of Blog Posts Using Distant Supervision,” in: Proceedings of the Workshop on Semantic Analysis in Social Media 2012, 28–36, http://www.aclweb.org/anthology/W12-0604.
Jannidis, Fotis/Reger, Isabella/Zehe, Albin et al., “Analyzing Features for the Detection of Happy Endings in German Novels,” in: DHd2017: Book of Abstracts, Bern 2017, 81–85, arXiv:1611.09028 [cs.IR] (November 28, 2016), https://arxiv.org/abs/1611.09028, http://www.dhd2017.ch/wp-content/uploads/2017/03/Abstractband_def3_M%C3%A4rz.pdf.
Karsdorp, Folgert/Kestemont, Mike/Schöch, Christof et al., “The Love Equation: Computational Modeling of Romantic Relationships in French Classical Drama,” in: Mark A. Finlayson/Antonio Lieto/Ben Miller (eds.), 6th Workshop on Computational Models of Narrative 45 (2015), 98–107, DOI https://doi.org/10.4230/OASIcs.CMN.2015.98.
Klotz, Volker, Geschlossene und offene Form im Drama, München 1960.
Google Scholar
Marcus, Solomon, Mathematische Poetik, Bucuresti 1973.
Google Scholar
Michele, Mary Di, Tenor of Love, New York 2005.
Google Scholar
Mikolov, Tomas/Sutskever, Ilya/Chen, Kai et al., “Distributed Representations of Words and Phrases and their Compositionality,” in: Advances in Neural Information Processing Systems 26 (2013), 3111–3119.
Google Scholar
Moretti, Franco, “Network Theory, Plot Analysis,” in: Pamphlets of the Stanford Literary Lab 2 (2011), https://litlab.stanford.edu/LiteraryLabPamphlet2.pdf.
Moretti, Franco, Distant Reading, London 2013.
Google Scholar
Moretti, Franco, “‘Operationalizing’: or, the function of measurement in modern literary theory,” in: Pamphlets of the Stanford Literary Lab 6 (2013), https://litlab.stanford.edu/LiteraryLabPamphlet6.pdf.
Nalisnick, Eric T./Baird, Henry S., “Character-to-Character Sentiment Analysis in Shakespeare’s Plays,” in: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Vol. 2: Short Papers, Sofia 2013, 479–483, http://www.aclweb.org/anthology/P13-2085.
Pearson, Karl, “Notes on Regression and Inheritance in the Case of Two Parents,” in: Proceedings of the Royal Society of London 58 (1895), 240–242.
Article Google Scholar
Pfister, Manfred, The Theory and Analysis of Drama, Cambridge, UK, New York 1988.
Book Google Scholar
Quamen, Harvey/Hjartarson, Paul, “Big Data and the Literary Archive: Topic Modeling the Watson-McLuhan Correspondence,” in: Proceedings of Dh (2014), http://dharchive.org/paper/DH2014/Poster-94.xml.
Reiter, Nils, DramaAnalysis (2017), DOI https://doi.org/10.5281/zenodo.847167.
Article Google Scholar
Reiter, Nils, DramaNLP (2017), DOI https://doi.org/10.5281/zenodo.400627.
Article Google Scholar
Nils Reiter/Marcus Willand, “Surveying Shakespeare’s Impact on the German Drama: Taking a Computational Approach to an Epoch,” in: Sandro Jung and Michael Wood (eds.), Anglo-German Dramatic an Poetic Cultures: New Perspectives on Exchange in the Sattelzeit. Bethlehem, PA, pp- 117–143 (to appear 2019).
Google Scholar
Reiter, Nils/Gius, Evelyn/Strötgen, Jannik et al., “A Shared Task for a Shared Goal—Systematic Annotation of Literary Texts,” in: Digital Humanities 2017: Conference Abstracts, Montreal 2017, https://dh2017.adho.org/abstracts/192/192.pdf.
Rhody, Lisa M., “Topic Modeling and Figurative Language,” in: Journal of Digital Humanities 12/1 (2012), 19–38.
Google Scholar
Rotth, Albrecht C., Vollständige Deutsche Poesie: In drey Theilen, Leipzig 1688.
Google Scholar
Scherer, Jacques, La dramaturgie classique en France, Paris 1950.
Google Scholar
Schmidt, Benjamin, WordVectors (2017), https://github.com/bmschmidt/wordVectors.
Schmidt, Benjamin, “Words Alone: Dismantling Topic Models in the Humanities,” in: Journal of Digital Humanities 2/1 (2012), http://journalofdigitalhumanities.org/2-1/words-alone-by-benjamin-m-schmidt/.
Schöch, Christof, “Topic Modeling Genre: An Exploration of French Classical and Enlightenment Drama,” in: Digital Humanities Quarterly 2 (November 2017), DOI https://doi.org/10.5281/zenodo.166356.
Smith, Bruce, “Scene,” in: Henry S. Turner (ed.), Early modern theatricality, Oxford 2013, 93–112.
Google Scholar
Toutanova, Kristina/Klein, Dan/Manning, Christopher et al., “Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network,” in: Proceedings of Hlt-Naacl (2003), 252–259.
Google Scholar
Trilcke, Peer/Fischer, Frank/Kampkaspar, Dario, “Digital Network Analysis of Dramatic Texts,” in: DH2015 Conference Abstracts, Sydney 2015, https://dlina.github.io/presentations/2015-sydney/sydney.html/.
Trilcke, Peer/Fischer, Frank/Göbel, Mathias et al., “Netzwerkdynamik und Plotanalyse. Zur Visualisierung und Berechnung der ‘progressiven Strukturierung’ literarischer Texte,” in: Book of Abstracts of Dhd 2017, Bern 2017, 175–180
Google Scholar
Underwood, Ted/Goldstone, Andrew, “What can topic models of PMLA teach us about the history of literary scholarship?” in: Journal of Digital Humanities 2/1 (2014), http://journalofdigitalhumanities.org/2-1/what-can-topic-models-of-pmla-teach-us-by-ted-underwood-and-andrew-goldstone/.
Vogel, Juliane, “Aus dem Takt: Auftrittsstrukturen in Schillers Don Carlos,” in: Deutsche Vierteljahrsschrift für Literaturwissenschaft und Geistesgeschichte 86/4 (2012), 532–546.
Article Google Scholar
Weinrich, Harald, “Hoch und Niedrig in der Literatur,” in: Horst Rüdiger (ed.), Literatur und Dichtung, Stuttgart u.a. 1973, 160–170.
Google Scholar
Willand, Marcus, “Hermeneutische Interpretation und digitale Analyse. Eine Verhältnisbestimmung,” in: Luisa Banki (ed.), Lektüren. Positionen Zeitgenössischer Philologie, Trier 2017, 77–100.
Google Scholar
Willand, Marcus/Reiter, Nils, “Geschlecht und Gattung. Digitale Analysen von Kleists Familie Schroffenstein,” in: Kleist-Jahrbuch (2017), 177–195.
Google Scholar

Digital Resources

Apache UIMA, http://uima.apache.org.
GitHub, QuaDramA, https://quadrama.github.io.
TextGrid Repository, https://textgridrep.org.

Download references

Acknowledgements

The work described in this article has been conducted in the context of the research project QuaDramA—Quantitative Drama Analytics^{Footnote 55}, generously funded by the Volkswagen Foundation.

Author information

Authors and Affiliations

Universität zu Köln, Köln, Deutschland
Nils Reiter
Heidelberg, Deutschland
Marcus Willand

Authors

Nils Reiter
View author publications
You can also search for this author in PubMed Google Scholar
Marcus Willand
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nils Reiter .

Editor information

Editors and Affiliations

Universität Würzburg, Würzburg, Deutschland
Fotis Jannidis

Appendix

GermaNet Concepts

POS	Fields
Adjectives	Allgemein, Bewegung, Gefuehl, Geist, Gesellschaft, Koerper, Menge, natPhaenomen, Ort, Pertonym, Perzeption, privativ, Relation, Substanz, Verhalten, Zeit
Nouns	Artefakt, Attribut, Besitz, Form, Gefuehl, Geschehen, Gruppe, Koerper, Kognition, Kommunikation, Menge, Mensch, Motiv, Nahrung, natGegenstand, natPhaenomen, Ort, Pflanze, Relation, Substanz, Tier, Tops, Zeit
Verbs	Allgemein, Besitz, Gefuehl, Gesellschaft, Koerperfunktion, Kognition, Kommunikation, Konkurrenz, Kontakt, Lokation, natPhaenomen, Perzeption, Schoepfung, Veraenderung, Verbrauch

Corpus

Title	Author
Der Gwissenswurm	Anzengruber, Ludwig
Heimg’funden	Anzengruber, Ludwig
Die Kreuzelschreiber	Anzengruber, Ludwig
Der Rauchfangkehrer	Auenbrugger, Johann Leopold von
Der Postzug oder die noblen Passionen	Ayrenhoff, Cornelius Hermann von
Industrie und Herz	Bauernfeld, Eduard von
Großjährig	Bauernfeld, Eduard von
Bürgerlich und Romantisch	Bauernfeld, Eduard von
Die Hochzeitsreise	Benedix, Julius Roderich
Ein Faust der That	Bleibtreu, Karl
Im weißen Rößl	Blumenthal, Oskar
Der Bookesbeutel	Borkenstein, Hinrich
De rode Ünnerrock	Boßdorf, Hermann
Faust	Braun von Braunthal, Karl Johann
Der Freigeist	Brawe, Joachim Wilhelm von
Leonce und Lena	Büchner, Georg
Der Mißtrauische	Cronegk, Johann Friedrich von
Die Gunst des Augenblicks	Devrient, Philipp Eduard
Die Freier	Eichendorff, Joseph von
Der letzte Held von Marienburg	Eichendorff, Joseph von
Eid und Pflicht	Engel, Johann Jakob
Der Frauenmut	Essig, Hermann
Die Journalisten	Freytag, Gustav
Die zärtlichen Schwestern	Gellert, Christian Fürchtegott
Die Betschwester	Gellert, Christian Fürchtegott
Der Moloch	Gerhäuser, Emil
Ugolino	Gerstenberg, Heinrich Wilhelm von
Der Großkophta	Goethe, Johann Wolfgang
Faust. Der Tragödie zweiter Teil	Goethe, Johann Wolfgang
Clavigo	Goethe, Johann Wolfgang
Die natürliche Tochter	Goethe, Johann Wolfgang
Egmont	Goethe, Johann Wolfgang
Der sterbende Cato	Gottsched, Johann Christoph
Das Testament	Gottsched, Luise Adelgunde Victorie
Kaiser Heinrich der Sechste	Grabbe, Christian Dietrich
Kaiser Friedrich Barbarossa	Grabbe, Christian Dietrich
Don Juan und Faust	Grabbe, Christian Dietrich
Herzog Theodor von Gothland	Grabbe, Christian Dietrich
Des Meeres und der Liebe Wellen	Grillparzer, Franz
Die Jüdin von Toledo	Grillparzer, Franz
Ein Bruderzwist in Habsburg	Grillparzer, Franz
Die Ahnfrau	Grillparzer, Franz
König Ottokars Glück und Ende	Grillparzer, Franz
Ein treuer Diener seines Herrn	Grillparzer, Franz
Das Urbild des Tartüffe	Gutzkow, Karl
Der Furchtsame	Hafner, Philipp
Mägera, die förchterliche Hexe	Hafner, Philipp
Hanna Jagert	Hartleben, Otto Erich
Rosenmontag	Hartleben, Otto Erich
Der Diamant	Hebbel, Friedrich
Der Rubin	Hebbel, Friedrich
Herodes und Mariamne	Hebbel, Friedrich
Gyges und sein Ring	Hebbel, Friedrich
Genoveva	Hebbel, Friedrich
Judith	Hebbel, Friedrich
Kriemhilds Rache	Hebbel, Friedrich
Maria Magdalene	Hebbel, Friedrich
Agnes Bernauer	Hebbel, Friedrich
Die Kinder Godunófs	Heiseler, Henry von
Don Juan’s Ende	Heyse, Paul
Der Rosenkavalier	Hofmannsthal, Hugo von
Der Unbestechliche	Hofmannsthal, Hugo von
Der Schwierige	Hofmannsthal, Hugo von
Der Turm	Hofmannsthal, Hugo von
Der Tod des Empedokles	Hölderlin, Friedrich
Ein Trauerspiel in Berlin	Holtei, Karl von
Ignorabimus	Holz, Arno
Das Gericht von St. Petersburg	Immermann, Karl
Andreas Hofer, der Sandwirt von Passeyer	Immermann, Karl
Die Opferung	Kaltneker, Hans
Amphitryon	Kleist, Heinrich von
Der zerbrochene Krug	Kleist, Heinrich von
Die Familie Schroffenstein	Kleist, Heinrich von
Faust. Ein Trauerspiel in fünf Acten	Klingemann, August
Die Zwillinge	Klinger, Friedrich Maximilian
Das leidende Weib	Klinger, Friedrich Maximilian
Der Tod Adams	Klopstock, Friedrich Gottlieb
Die deutschen Kleinstädter	Kotzebue, August von
Die Indianer in England	Kotzebue, August von
Die beiden Klingsberg	Kotzebue, August von
Die Candidaten oder Die Mittel zu einem Amte zu gelangen	Krüger, Johann Christian
Die Geistlichen auf dem Lande	Krüger, Johann Christian
Franz von Sickingen	Lassalle, Ferdinand
Gottsched und Gellert	Laube, Heinrich
Julius von Tarent	Leisewitz, Johann Anton
Die Freunde machen den Philosophen	Lenz, Jakob Michael Reinhold
Der Hofmeister oder Vorteile der Privaterziehung	Lenz, Jakob Michael Reinhold
Der neue Menoza	Lenz, Jakob Michael Reinhold
Die Soldaten	Lenz, Jakob Michael Reinhold
Die alte Jungfer	Lessing, Gotthold Ephraim
Der Freigeist	Lessing, Gotthold Ephraim
Minna von Barnhelm, oder das Soldatenglück	Lessing, Gotthold Ephraim
Der junge Gelehrte	Lessing, Gotthold Ephraim
Der Misogyn	Lessing, Gotthold Ephraim
Die Juden	Lessing, Gotthold Ephraim
Samuel Henzi	Lessing, Gotthold Ephraim
Miß Sara Sampson	Lessing, Gotthold Ephraim
Emilia Galotti	Lessing, Gotthold Ephraim
Cleopatra	Lohenstein, Daniel Casper von
Epicharis	Lohenstein, Daniel Casper von
Ibrahim	Lohenstein, Daniel Casper von
Agrippina	Lohenstein, Daniel Casper von
Die Makkabäer	Ludwig, Otto
Der Erbförster	Ludwig, Otto
Der alte Bürger-Capitain	Malß, Karl
König Yngurd	Müllner, Adolph
Die Schuld	Müllner, Adolph
Die Schäferinsel	Mylius, Christlob
Das Schäferfest oder Die Herbstfreude	Neuber, Friederike Caroline
Das Liebeskonzil	Panizza, Oskar
Lucie Woodvil	Pfeil, Johann Gottlob Benjamin
Die verhängnisvolle Gabel	Platen, August von
Der romantische Ödipus	Platen, August von
Die politische Wochenstube	Prutz, Robert Eduard
Der Hypochondrist	Quistorp, Theodor Johann
Kritik und Antikritik	Raupach, Ernst
Graf Ehrenfried	Reuter, Christian
Die Fahnenweihe	Ruederer, Josef
Der fanatische Bürgermeister	Scheerbart, Paul
Okurirasûna	Scheerbart, Paul
Die Jungfrau von Orleans	Schiller, Friedrich
Kabale und Liebe	Schiller, Friedrich
Maria Stuart	Schiller, Friedrich
Die Verschwörung des Fiesco zu Genua	Schiller, Friedrich
Alarkos	Schlegel, Friedrich
Der geschäftige Müßiggänger	Schlegel, Johann Elias
Der Triumph der guten Frauen	Schlegel, Johann Elias
Canut	Schlegel, Johann Elias
Comedia des verlornen Sons	Schmeltzl, Wolfgang
Professor Bernhardi	Schnitzler, Arthur
Liebelei	Schnitzler, Arthur
Der Vetter in Lissabon	Schröder, Friedrich Ludwig
Die Komödie der Irrungen	Shakespeare, William
Julius Cäsar	Shakespeare, William
Hamlet. Prinz von Dänemark	Shakespeare, William
König Richard III	Shakespeare, William
König Lear	Shakespeare, William
Julie	Sturz, Helfrich Peter
Der Bettler von Syrakus	Sudermann, Hermann
Die Lokalbahn	Thoma, Ludwig
Prinz Zerbino oder die Reise nach dem guten Geschmack	Tieck, Ludwig
Agnes Bernauerin	Törring, Josef August von
Ernst Herzog von Schwaben	Uhland, Ludwig
Faust III	Vischer, Friedrich Theodor
Faust. Trauerspiel mit Gesang und Tanz	Voß, Julius von
Die Kindermörderin	Wagner, Heinrich Leopold
Frühlings Erwachen	Wedekind, Frank
Die Büchse der Pandora	Wedekind, Frank
Erdgeist	Wedekind, Frank
Bäurischer Machiavellus	Weise, Christian
Atreus und Thyest	Weiße, Christian Felix
Das Manuscript	Weißenthurn, Johanna von
Welcher ist der Bräutigam	Weißenthurn, Johanna von
Lady Johanna Gray	Wieland, Christoph Martin
Gracchus der Volkstribun	Wilbrandt, Adolf von
Die Karolinger	Wildenbruch, Ernst von
Armut	Wildgans, Anton

Rights and permissions

Open Access Dieses Kapitel wird unter der Creative Commons Namensnennung 4.0 International Lizenz (http://creativecommons.org/licenses/by/4.0/deed.de) veröffentlicht, welche die Nutzung, Vervielfältigung, Bearbeitung, Verbreitung und Wiedergabe in jeglichem Medium und Format erlaubt, sofern Sie den/die ursprünglichen Autor(en) und die Quelle ordnungsgemäß nennen, einen Link zur Creative Commons Lizenz beifügen und angeben, ob Änderungen vorgenommen wurden.

Die in diesem Kapitel enthaltenen Bilder und sonstiges Drittmaterial unterliegen ebenfalls der genannten Creative Commons Lizenz, sofern sich aus der Abbildungslegende nichts anderes ergibt. Sofern das betreffende Material nicht unter der genannten Creative Commons Lizenz steht und die betreffende Handlung nicht nach gesetzlichen Vorschriften erlaubt ist, ist für die oben aufgeführten Weiterverwendungen des Materials die Einwilligung des jeweiligen Rechteinhabers einzuholen.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Reiter, N., Willand, M. (2022). What are they Talking About? A Systematic Exploration of Theme Identification Methods for Character Speech in Dramatic Texts. In: Jannidis, F. (eds) Digitale Literaturwissenschaft. Germanistische Symposien. J.B. Metzler, Stuttgart. https://doi.org/10.1007/978-3-476-05886-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-476-05886-7_20
Published: 01 March 2023
Publisher Name: J.B. Metzler, Stuttgart
Print ISBN: 978-3-476-05885-0
Online ISBN: 978-3-476-05886-7
eBook Packages: J.B. Metzler Humanities (German Language)

Publish with us

Policies and ethics

What are they Talking About? A Systematic Exploration of Theme Identification Methods for Character Speech in Dramatic Texts

Zusammenfassung

Similar content being viewed by others

Classifying Written Texts Through Rhythmic Features

Drammar: A Comprehensive Ontological Resource on Drama

Dialogue analysis: a case study on the New Testament

1 Introduction

2 Comparing Systems without a Gold Standard

3 Theme Detection Systems

3.1 Dictionary-based Approaches

Custom Dictionary

Enriched Custom Dictionary

GermaNet Dictionary

3.2 Topic Modeling

4 Postulates about Dramatic Texts

4.1 Copresence

4.2 Scene Boundaries

4.3 Tragic or Happy Ends: Die or Marry

4.4 Dynamic and Static Protagonists

5 Experimental Setup

5.1 Formalization

5.2 Corpus

5.3 Preprocessing

6 Experiments

6.1 Copresence

6.2 Scene boundaries

6.3 Tragic or happy ends: Die or marry

6.4 Dynamic and static protagonists

7 Conclusions

Notes

References

Digital Resources

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation