1 Introduction

Automatic Text Generation (ATG) is a task that has been widely studied by researchers in the area of Natural Language Processing (NLP) [13, 20,21,22,23]. Results from several investigations have presented very encouraging results that have allowed the establishment of progressively more challenging objectives in this field. Currently, the scope of ATG is being expanded and there is much recent work that aims at generating text related to a specific domain [8, 10, 18]. In this research, algorithms have been developed, oriented to such diverse purposes as creating chat-bots for customer service in commercial applications, automatic generation of summaries for academic support, or developing generators of literary poetry, short stories, novels, plays and essays [3, 17, 25].

The creation of a literary text is particularly interesting and challenging, when compared to other types of ATG, as these texts are not universally and perpetually perceived. Furthermore, this perception can also vary depending on the reader’s mood. It can thus can be assumed that literary perception is subjective and from this perspective, it is difficult to ensure that text generated by an algorithm will be perceived as literature. To reduce this possible ambiguity regarding literary perception, we consider that literature is regarded as text that employs a vocabulary which may be largely different from that used in common language and that it employs various writing styles, such as rhymes, anaphora, etc. and figures of speech, in order to obtain an artistic, complex, possibly elegant and emotional text [11]. This understanding gives us a guide for the development of our model whereby literary sentences can be generated.

We present here a model for the generation of literary text (GLT) in Spanish which is guided by psychological characteristics of the personality of characters in literature. The model is based on the assumption that these psychological traits determine a person’s emotions and speech. We thus generate literary sentences based upon a situation or context, and also on psychological traits. It is then possible to perform an analysis of the personality of a character through the author’s writing, considering parts of speech such as verbs, adjectives, conjunctions, and the spinning of words or concepts, as well as other characteristics.

In Sect. 2, we present a review of the main literature that has addressed topics related to ATG, with focus on those that proposed methods and algorithms integrated into this work. We describe the corpus used to train our models in Sect. 3. Section 4 describes in detail the methodology followed for the development of our model. In Sect. 5, we show some experiments, as well as the results of human evaluations of the generated sentences. Finally, in Sect. 6 we present our conclusions.

2 Related Work

The task of ATG has been widely addressed by the research community in recent years. In [21], Szymanski and Ciota presented a model based on Markov sequences for the stochastic generation of text, where the intention is to generate grammatically correct text, although without considering a context or meaning. Shridaha et al. [20] present an algorithm to automatically generate descriptive summary comments for Java methods. Given the signature and body of a method, their generator identifies the content for the summary and generates natural language text that summarizes the method’s overall actions.

Work with an artistic, literary approach has also been developed for GLT. The works of Riedl and Young [17] and Clark, Ji and Smith [3] propose stochastic models, based on contextual analysis, for the generation of fictional narratives (stories). Zhang and Lapata [25] use neural networks and Oliveira [12, 13] uses the technique of canned text for generating poems. Some research has achieved the difficult task of generating large texts, overcoming the barrier of phrases and paragraphs, such as the MEXICA project [15].

Personality analysis is a complicated task that can be studied from different perspectives (see [19, 24] and references therein). Recent research has investigated the relation between the characteristics of literary text and personality, which can be understood as the complex of the behavioural, temperamental, emotional and mental attributes that characterise a unique individual [6, 7]. In [7], a type of personality is detected from the analysis of a text, using an artificial neural network (ANN), which classifies text into one of five personality classes considered by the authors (Extroversion, Neuroticism, Agreeableness, Conscientiousness, Openness). Some characteristics that are considered by the authors are writing styles and relationships between pairs of words.

3 Corpus

We have built two corpora in Spanish consisting of the main works of Johann Wolfgang von Goethe and Edgar Allan Poe, called cGoethe and cPoe, respectively. These corpora are analyzed and used to extract information about the vocabulary used by these authors. We later chose an important work for each author where the emotions and feelings, i.e. psychological traits, of main characters are easily perceived by readers. For Goethe, we selected the novel The Sorrows of Young Werther [5] and for Poe, we selected the story The Cask of Amontillado [16], both in their Spanish version. The two corpora generated from these literary works were used to extract sentences, that were later used as a basis for the generation of new sentences, as we describe in Sect. 4.1. We also use the corpora 5KL described in [11] for the final phase of the sentence generation procedure described in 4.2.

To build the cGoethe and cPoe corpora, we processed each constituent literary work (originally found in heterogeneous formats), creating a single document for each corpus, encoded in utf8. This processing consisted of automatically segmenting the phrases into regular expressions, using a program developed in PERL 5.0, to obtain one sentence per line in each corpus.

From the segmented phrases in cGoethe and cPoe, we selected only those that belong to the works The Sorrows of Young Werther (set 1) and The Cask of Amontillado (set 2). From sets 1 and 2, we manually extracted phrases that we considered to be very literary, to form two new corpora, cWerther and cCask, respectively. In this step, we chose phrases with complex vocabulary, a directly expressed message and some literary styles like rhymes, anaphoras, etc. Table 1 shows basic statistical information of the cGoethe and cPoe corpora. Table 2, shows similar information for cWerther and cCask. Table 3 shows statistical information of the 5KL corpus. The 5KL corpus contains approximately 4 000 literary works, some originally written in Spanish and the rest in their translations to Spanish, from various authors and different genres, and is extremely useful, as it consists of an enormous vocabulary, that forms a highly representative set to train our Word2vec based model, described in Sect. 4.

Table 1. Corpora formed by main works of each author. K represents one thousand and M represents one million.
Table 2. Corpora of literary phrases of one selected work.
Table 3. Corpus 5KL, composed of 4 839 literary works.

4 Model

We now describe our proposed model for the generation of literary phrases in Spanish, which is an extension of Model 3 presented in [11]. The model consists of two phases described as follows. In the first phase, the Partially Empty Grammatical Structures (PGSs), each composed by elements constituted either by parts of speech (POS) tags or function wordsFootnote 1, are generated. A PGS is constructed, through a morphosyntactic analysis made with FreeLingFootnote 2, for each phrase of the corpora cWerther and cCask, described in the Sect. 3. The POS tags of these PGSs are replaced by words during the second phase.

FreeLing [14] is a commonly used tool for morphosyntactic analysis, which receives as input a string of text and returns a POS tag as output, for each word in the string. The POS tag indicates the type of word (verb, noun, adjective, adverb, etc.), and also information about inflections, i.e., gender, conjugation and number. For example, for the word “Investigador” FreeLing generates the POS tag [NCMS000]. The first letter indicates a Noun, the second a Common noun, M stands for Male gender and the fourth gives number information (Singular). The last 3 characters give information about semantics, named entities, etc. We will use only the first 4 symbols of the POS tags.

In the second phase, each POS tag in the PGSs are replaced by a corresponding word, using a semantic approach algorithm based on an ANN model (Word2vecFootnote 3). Corpus 5KL, described in Sect. 3, was used for training Word2vec, as well as the following parameters: only words with more than 5 occurrences in 5KL were considered, the size of the context window is 10, the dimensions of the vector representations were tested within a range of 50 to 100, being 60 the dimension with the best results. The Word2vec model we have used is the continuous skip-gram model (Skip-gram), which receives a word (Query) as input and, as output, returns a set of words (embeddings) semantically related to the Query. The process for the generation of sentences is described in what follows.

4.1 Phase I: PGS Generation

For the generation of each PGS, we use methods guided by fixed morphosyntactic structures, called Template-based Generation or canned text. In [10], it is argued that the use of these techniques saves time in syntactic analysis and allows one to concentrate directly on the vocabulary. The canned text technique has also been used in several works, with specific purposes, such as in [4, 8], where the authors developed models for the generation of simple dialogues and phrases.

We use canned text to generate text based on templates obtained from cCask and cWerther. These corpora contain flexible grammatical structures that can be manipulated to create new phrases. The templates in a corpus can be selected randomly or through heuristics, according to a predefined objective. The process starts with the random selection of an original phrase \(f_{o} \in \) corpus of length \( N = |f_{o}|\). A template PGS is built from the words of \(f_o\), where content words, verbs (v), nouns (n) or adjectives (a), are replaced by their respective POS tags and function words are retained. \(f_{o}\) is analyzed with FreeLing and words with “values” in \(\{v, n, a\}\) are replaced by their respective POS tags. These content words provide most of the information in any text, regardless of their length or genre [2]. Our hypothesis is that by changing only content words, we simulate the generation of phrases by homo-syntax: different semantics, same structure. The output of this process is a PGS with function words that give grammatical support and POS tags that will be replaced, in order to change the meaning of the sentence. Phase I is illustrated in Fig. 1, where full boxes represent function words and empty boxes represent POS tags.

Fig. 1.
figure 1

Canned text model for generating a PGS.

4.2 Phase II: Semantic Analysis and Substitution of POS Tags

In this phase, the POS tags of the PGS generated in Phase I are replaced. Tags corresponding to nouns are replaced with a vocabulary close to the context defined by the user (the query), while verbs and adjectives are replaced with a vocabulary more similar in meaning to the original terms of \(f_o\). The idea is to preserve the style and emotional-psychological content, that the author intended to associate with the characters that he is portraying in his work.

Corpus 5KL is pre-processed to standardize text formatting, eliminating characters that are not important for semantic analysis such as punctuation and numbers. This stage prepares the Word2vec training data that uses a vector representation of 5KL. We use GensimFootnote 4, a Python implementation of Word2vec. A query Q, provided by the user, is given as input to this algorithm, and its output is a set of words (embeddings) associated with a context defined by Q. In other words, Word2vec receives a term Q and returns a lexicon \(L(Q)=(Q_1,Q_2,...,Q_m)\), that represents a set of \(m=10\) words semantically close to Q. We chose this value of m because we found that if we increase the number of words obtained by Word2vec, they start to lose their relation to Q. Formally, we represnt a mapping by Word2vec as \(Q \rightarrow L(Q)\).

Corpora cGoethe or cPoe, previously analyzed using FreeLing to obtain PGSs, had a POS tag associated to each content word. Now, with FreeLing, each POS tag is used to create a set of words, with the same grammatical information (identical POS tags). An Association Table (AT) is generated as a result of this process. The AT consists of entries of the type: POS\(_k \rightarrow \) list of words \(v_{k,i}\), with same grammatical information, formally POS\(_k \rightarrow {\varvec{V}_k} =\{v_{k,1},v_{k,2},...,v_{k,i},...\}\). To generate a new phrase, each tag POS\(_k\) \(\in \) PGS, is replaced by a word selected from the lexicon \(\varvec{V}_k\), given by AT.

To choose a word in \(\varvec{V}_k\) to replace POS\(_k\), we use the following algorithm. A vector is constructed for each of the three words defined as:

  • o: is the \(k_{th}\) word in the phrase \(f_{o}\), corresponding to tag POS\(_k\);

  • Q: word defining the query provided by the user;

  • w: candidate word that could replace POS\(_k\), \(w \in {\varvec{V}}_k\).

For each word o, Q and w, 10 closest words, \(o_i\), \(Q_i\) and \(w_i\), \(i = 1,...,10\), are obtained with Word2vec. These 30 words are concatenated and represented by a vector \(\varvec{U}\) with dimension 30. The dimension was set to 30, as a compromise between lexical diversity and processing time. The vector \(\varvec{U}\) can be written as

$$\begin{aligned} \varvec{U} = (o_{1},...,o_{10},Q_{11},...,Q_{20},w_{21},...,w_{30})\, = (u_1, u_2,...,u_{30}) \,. \end{aligned}$$
(1)

Words o, Q and w generate three numerical vectors of 30 dimensions respectively, o \( \rightarrow \varvec{X} = (x_1,...,x_{30} )\), Q \( \rightarrow \varvec{Q} = (q_1,...,q_{30} )\), and w \(\rightarrow \varvec{W} = (w_1,...,w_{30} )\), where the elements \(x_j\) of \(\varvec{X}\) are obtained by taking the distance \(x_j = dist(o,u_j) \in [0,1]\), between o and each \(u_j \in \varvec{U}\), provided by Word2vec. Obviously, o will be closer to the 10 first \(u_j\) than to the remaining ones. A similar process is used to obtain the elements of \(\varvec{Q}\) and \(\varvec{W}\) from Q and w, respectively. Cosine similarities are then calculated between \(\varvec{Q}\) and \(\varvec{W}\), and \(\varvec{X}\) and \(\varvec{W}\) as

$$\begin{aligned} \theta= & {} \cos (\varvec{Q},\varvec{W}) = \frac{\varvec{Q} \cdot \varvec{W}}{|\varvec{Q}| |\varvec{W}|} \, , \,\,\,\, 0 \le \theta , \le 1 \, , \end{aligned}$$
(2)
$$\begin{aligned} \beta= & {} \cos (\varvec{X},\varvec{W}) = \frac{\varvec{X} \cdot \varvec{W}}{|\varvec{ X}| |\varvec{W}|} \, , \,\,\,\, 0 \le \beta , \le 1 \, . \end{aligned}$$
(3)

This process is repeated r times, once for each word \(w = v_{k,i}\) in \({\varvec{V}}_k\), and similarities \(\theta _i\) and \(\beta _i\), \( i= 1,..., r\), are obtained for each \(v_{k,i}\), as well as the averages \(\langle \theta \rangle = \sum \theta _i / r\) and \(\langle \beta \rangle = \sum \beta _i / r\). The normalized ratio \(\left( \frac{\langle \theta \rangle }{\theta _i} \right) \) indicates how large the similarity \(\theta _i\) is with respect to the average \(\langle \theta \rangle \) that is, how close is the candidate word \(w = v_{k,i}\) to the query Q. The ratio \(\left( \frac{\beta _i}{\langle \beta \rangle } \right) \) indicates how reduced the similarity \(\beta _i\) is to the average \(\langle \beta \rangle \), that is, how far away the candidate word w is from word o of \(f_{o}\). A score \(Sn_i\) is obtained for each pair \((\theta _i, \beta _i)\) as

$$\begin{aligned} Sn_i = \left( \frac{\langle \theta \rangle }{\theta _i} \right) \cdot \left( \frac{\beta _i}{\langle \beta \rangle } \right) \, . \end{aligned}$$
(4)

The higher the value of \(Sn_i\), the better the candidate, \(w = v_{k,i}\), complies to the goal of approaching Q and moving away from the semantics of \(f_{o}\). This goal aims to obtain the candidate \(v_{k,i}\) closer to Q, although still considering the context of \(f_{o}\). We use candidates with large values of \(Sn_i\) to replace the nouns. To replace verbs and adjectives, we want the candidate \( w = v_{k,i}\) closer to \(f_{o}\), so we choose among candidates with large \(S\mathrm{va}_i\), given by

$$\begin{aligned} S\mathrm{va}_i = \left( \frac{\theta _i}{\langle \theta \rangle } \right) \cdot \left( \frac{\langle \beta \rangle }{\beta _i}\right) \, . \end{aligned}$$
(5)

Finally, we sort the values of \(Sn_i\) (nouns) or \(S\mathrm{va}_i\) (verbs and adjectives) in decreasing order and choose, at random, from the highest three values, the candidate \(v_{k,i}\) that will replace the POS\(_k\) tag. The result is a newly generated phrase f(QN) that does not exist in the corpora, but maintains the psychological mood (emotional content) of \(f_o\). The model is shown in Fig. 2.

Fig. 2.
figure 2

Model for semantic approximation based on geometrical interpretation.

5 Evaluation and Results

A manual evaluation protocol has been designed to measure some characteristics of the phrases generated by our model. For baseline comparison, we used the model with the best results observed in experiments described in [11]. Although the baseline evaluation was done with a different scale for the evaluation parameters than in the current case, the comparison we present here helps us to understand the human evaluators’ perception of the generated sentences.

Sentences have been generated using the corpora 5KL, cGoethe, cPoe, cWerther and cCask, as explained in Sect. 4. The queries employed for generating the sentences in Spanish are (in English {HATE, LOVE, SUN, MOON}). We show some examples of sentences generated in Spanish in our experiments, manually translated to English.

Sentences Generated Using cCask, cPoe and 5KL

  1. 1.

    f(LOVE,12) = But I do not think anyone has ever promised against good will.

  2. 2.

    f(HATE,11) = A beautiful affection and an unbearable admiration took hold of me.

  3. 3.

    f(SUN,9) = So much does this darkness say, my noble horizon!

  4. 4.

    f(MOON,9) = My light is unhappy, and I wish for you.

Sentences Generated Using cWerther, cGoethe and 5KL

  1. 1.

    f(LOVE,11) = Keeping my desire, I decided to try the feeling of pleasure.

  2. 2.

    f(HATE,7) = I set about breaking down my distrust.

  3. 3.

    f(SUN,17) = He shouted, and the moon fell away with a sun that I did not try to believe.

  4. 4.

    f(MOON,12) = Three colors of the main shadow were still bred in this moon.

The experiment we performed consisted of generating 20 sentences for each author’s corpora and the 5KL corpus. For each author, 5 phrases were generated with each of the four queries \(Q \in \) {LOVE, HATE, SUN,MOON}. Five people were asked to read carefully and evaluate the total of 40 sentences. All evaluators are university graduates and native Spanish speakers. They were asked to score each sentence on a scale of \([0-4]\), where \(0 = \) very bad, \( 1 = \) bad, \(2 = \) acceptable, \(3 = \) good and \(4 = \) very good. The following criteria were used in the evaluations.

  • Grammar: spelling, conjugations and agreement between gender and number;

  • Coherence: legibility, perception of a general idea;

  • Context: relation between the sentence and the query.

We also asked the evaluators to indicate if, according to their own criteria, they considered the sentences as literary. Finally, the evaluators were to indicate which emotion they associate to each sentence (\( 0 = \) Fear, \( 1 = \) Sadness, \(2 = \) Hope, \(3 = \) Love, and \( 4 = \) Happiness). We compared our results with the evaluation made in [11] as a baseline. The evaluated criteria are the same in that and in this work. However, the evaluation scale in [11] is in a range of \([0-2]\) (\(0 = \) bad, \(1 = \) acceptable, \(2 = \) very good). Another difference is that, for the current evaluation, we have calculated the mode instead of the arithmetic mean, since it is more feasible for the analysis of data evaluated in the Likert scale.

In Fig. 3a, it can be seen that the Grammar criterion obtained good results, with a general perception of very good. This is similar to the average of 0.77 obtained in the evaluation of the model proposed in [11]. The Coherence was rated as bad, against the arithmetic mean of 0.60 obtained in [11]. In spite of being an unfavourable result, we can infer that the evaluators were expecting coherent and logical phrases. Logic is a characteristic that is not always perceived in literature and we inferred, noting that many readers considered the sentences as not being literary (Fig. 3b). The evaluators perceived as acceptable the relation between the sentences and the Context given by the query. This score is similar with the average of 0.53 obtained in [11]. Although one might consider that this rate should be improved, it is important to note that our goal in the current work is not only to approach the context, but to stay close enough to the original sentence, in order to simulate the author’s style and to reproduce the psychological trait of the characters in the literary work.

Fig. 3.
figure 3

Evaluation of our GLT model.

In Fig. 3b, we observe that \(67\%\) of the sentences were perceived as literary, although this is a very subjective opinion. This helps us understand that, despite the low rate obtained for Coherence, the evaluators do perceive distinctive elements of literature in the generated sentences. We also asked the evaluators to indicate the emotion they perceived in each sentence. We could thus measure to what extent the generated sentences maintain the author’s style and the emotions that he wished to transmit. In Fig. 4a, we observe that hope, happiness and sadness were the most perceived emotions in phrases generated with cCask and cPoe. Of these, we can highlight Sadness which is characteristic of much of Poe’s works. Although Hapiness is not typical in Poe’s works, we may have an explanation for this perception of the readers. If we analyze The Cask of Amontillado, we observe that its main character, Fortunato, was characterized as a happy and carefree man until his murder, perhaps because of his drunkenness. The dialogues of Fortunato may have influenced the selection of the vocabulary for the generation of the sentences and the perceptions of the evaluators. In Fig. 4b, we can observe that sentences generated with cWerther and cGoethe transmit mainly sadness, hope and love, which are psychological traits easily perceived when reading The Sorrows of Young Werther.

Fig. 4.
figure 4

Comparison between the emotions perceived in sentences.

6 Conclusions and Future Work

We have proposed a model for the generation of literary sentences, that is influenced by two important elements: a context, given by a word input by the user, and a writing mood and style, defined by the training corpora used by our models. The corpora cGoethe and cPoe, used to generate the association table in Phase II, capture the general mood and style of the two authors. The canned text technique applied to corpora cCask and cWerther, and the similarity measures and scores given by Eqs. (2), (3), (4) and (5), used to generate the final phrases of our generator, reflect the psychological traits and emotions of the characters in the corresponding works. The results are encouraging, as the model generates grammatically correct sentences. The phrases also associate well with the established context, and perceived emotions correspond, in good part, to the emotions transmitted in the literature involved. In the case of The Cask of Amontillado, the perceived emotions seem not to resemble the author’s main moods. This may be due to the fact that, in this short story, the dialogues of the main character are happy and carefree, the tragic murder occurring only in the end. When characters in a literary text have different psychological traits, the semantic analysis of the generated phrases may show heterogeneous emotional characteristics. Experiments with characters showing more homogeneous psychological traits, such as the melancholic, suicidal tendencies of Werther in The Sorrows of Young Werther, more easily detect emotions (as sadness and love) associated with a dominant psychological trend portrayed by the author.

Although there was a poor evaluation for the Coherence criterion, it is possible to argue that coherence is not a dominant feature of literature in general, and most of the generated sentences were perceived as literary. The model is thus capable of generating grammatically correct sentences, with a generally clear context, that transmits well the emotion and psychological traits portrayed by the content and style of an author. In future work, we consider extending the length of the generated text by joining several generated sentences together. The introduction of rhyme can be extremely interesting in this sense, when used to produce several sentences to constitute a paragraph or a stanza [9]. We also plan to use and train our model to analyse and generate text in other languages.