1 Introduction

Authorship of the Homeric Poems is an issue paramount to the so-called Homeric Question. The origins of the latter can be traced back as early as the Alexandrian Grammarians in the 3rd century BCE and it is still ongoing.Footnote 1 At the center of the debate lies the composition process of both the Iliad and the Odyssey, taking into account their emergence through an oral tradition, as well as the circumstances surrounding their initial recording in writing and the subsequent development of their text. It is now widely agreed that the two poems are accretive works, with contributions from multiple poets, performers and, later, scribes and editors. As a result, scholars are sometimes quick to interpret any textual peculiarities as additions or interpolations that occurred during one stage or another. By performing authorship analysis using statistical language models on the two Homeric Poems, we seek to identify linguistically unforeseen passages using computational methods. This research builds on our earlier work (Fasoi et al., 2021), where we examined the Iliad and the Odyssey using their 24 books (for each poem) as a structural unit. It became apparent that, for a more in-depth and fruitful analysis of the two works, shorter excerpts should be used as units. There are, therefore, two research questions that we seek to address with this work:

  1. 1.

    Are the two poems, in their most part, linguistically uniform enough to be traced back to a single process/tradition that resulted in the vulgate and the subsequent medieval text?

  2. 2.

    Since we also know that their written tradition has been a dynamic process (Lamberton, 1997, p. 33), can we identify specific passages of the poems that stand out, pointing to interpolations, revisions or different provenance?

2 Background and related work

2.1 Authorship of the homeric poems

Few scholars, if any, would argue today against the oral tradition behind the composition of the Homeric Poems. The two works are widely believed to have been composed through a process of continuous reciting, additional composition and, possibly, improvisation. This oral poetry culture, documented in antiquity, survives still today, especially in less literate communities. It appears that the text of the Iliad and the Odyssey as it came down to us results from a two stage mechanism: a long oral poetic tradition involving an unknown number of poets/performers and a long textual tradition extending from the first written documentation of the poems to our early printed editions. In the middle of these two stands the actual process of initially writing down the poems.

Through the years and mainly in the last couple of centuries, the question of authorship focused primarily on whether they were originally composed by a single poet and later enriched and improved by performing bards, or they were a product of collecting and stitching together several smaller independent parts (some possibly preexisting oral formulaic compositions used by performers to improvise and memorize longer works). Both sides of the debate recognize the poems as an accretion by multiple contributors but they disagree mainly as to whether a single poet’s contribution (not necessarily Homer) stands out to the degree of attributing the works to him. If such a poet could be singled out, this person may have either composed an original poem which formed the core for later additions, or they acted as editor during the process of writing down what, until then, were oral compositions. Further questions challenge the uniformity of the poems, claiming that some parts (either entire books or smaller, yet substantial portions of them) are later interpolations inserted during the process of oral transmission.

The story of speculation and uncertainty regarding the origin of Homeric works does not end with the recording in writing of the poems. The textual transmission of the written text (that is the form of the poems as it developed from the time they were first written down, probably around the 6th century BCE, until our earliest complete manuscript in the 10th century CE) further perplexes any attempt for authorship analysis. The medieval text of both the Iliad and the Odyssey is remarkably stable and uniform. This means that, among the thousands of manuscripts that preserve these works, there are relatively few textual variations from one witness to the other. This stable medieval text represents a text type that is known as the vulgate, however elusive the term might be. However, the discovery of ancient manuscripts in the late nineteenth century (mainly papyri from the Ptolemaic and Roman periods preserving fragments of the poems) together with quotations from authors dating back to the classical times, indicate that the written text of Homer was once substantially different (Haslam, 1997).

Within the framework of its oral origins and a written tradition scarcely documented in its earlier stages, a wide spectrum of theories seek to explain inconsistencies and peculiarities. No single theory has been universally accepted and many scholars feel that the debate is not as fruitful anymore—perhaps even in deadlock—for lack of newer or more definitive evidence.Footnote 2 With this in mind, we hope that digital methods and, more specifically authorship analysis, could open up new paths of research for Homerists to pursue.

Because it is unlikely that a single author is behind each poem, any authorship attribution/verification technique may seem immaterial. Yet, such a study is still relevant. Through centuries of oral performance, memorization and repetition, individual passages became gradually smoothed into what became a reasonably integrated entity. New additions and alterations often reflected one common tradition (or more closely related ones) of oral poetry techniques albeit bearing the individual stamp of each contributor. The same holds true regarding the written stage of the transmission. Whether it was written down from dictation as a fully formed text, or it was a gradual process that combined many versions and additions for years (Nagy, 2004) before it eventually begun resembling the vulgate, on which the unusually uniform medieval tradition is based, is irrelevant to the question that we attempt to answer here. Our quest is for insertions of large portions of text, of the type that urged some scholars to doubt their authenticity (Fowler, 2004).

This research does not aim at attributing passages to different people or traditions, especially in the oral stage of the transmission. We merely hope to identify passages that present greater or lesser linguistic proximity with each other or with the rest of the poem as a whole. Our expectation to detect such passages stems from the belief that the text that we now possess possibly bears (in a somewhat raw, less integrated form) contributions that were more recent, or originated from a distinct, more distant tradition, or perhaps they were added in the written phase of the text by a scribe or editor. Any such passages identified by our models may provide new evidence and fuel afresh the discussion regarding the history of the Homeric text.

2.2 Related work

In this work, we performed a computational authorship analysis of Homeric texts. We use the term computational authorship analysis to refer to the study of any association between a corpus and a group of authors. When only a single author is involved and when the association concerns attribution, then the task is authorship attribution or verification (Kestemont et al., 2012). Authorship attribution is the problem of identifying the author of an anonymous text, or a text whose authorship is in doubt (Love, 2002). In authorship verification, the aim is to determine whether two texts should be attributed to the same author or not (Kestemont et al., 2012). In the present study we examined the case of Homer, a leading poet of antiquity, and we studied the extent to which character-level statistical language models, trained on Homeric data, are able to detect whether other ancient texts bear the same authorship, conventionally attributed to Homer. Although computational authorship analysis is understudied in literature, it relates to computational authorship attribution and verification. Hence we discuss below research work addressing these two tasks.

Looking back at historical data, we detect the first attempts to authorship attribution in 18th-century Europe regarding the works of the English poet and playwright William Shakespeare (Mikros, 2015). Subsequently, in the 19th century, the same question was posed about one of the most important problems in the field of Classical Studies, namely the issue of dating the Platonic dialogues (Mikros, 2015). In the 20th century, a famous example of authorship attribution concerns the US Federalist Papers (Hamilton and James, 1787), 12 of which are probably written by Alexander Hamilton and James Madison (Holmes & Forsyth, 1995). Today, in the 21st century interest in this field of research is undiminished and growing, especially in computer science. Authorship attribution finds fertile ground in a variety of applications, including, among others, the process of detecting plagiarism (Kimler, 2003), forensic investigation - in the case of identifying the author of anonymous documents (Chaski, 2005) - and the process of detecting phishing e-mail (Gollub et al., 2013).

Quantitative authorship attribution concerns the employment of computational methods to address or assist with the task of authorship attribution. Methodologically, textual measurements in the anonymous (or disputed) text can be compared to their corresponding values in possible authors’ writing samples, in order to determine which author’s writing sample is the best match (Grieve, 2007). Textual measurements could comprise length of words/sentences or frequency computations, such as the frequency of graphemes, sequences, words, words at the beginning/end of a sentence, punctuation marks, and more. Put more broadly, quantitative authorship attribution is based on “stylistic features”, which can be divided into the following four categories, based on their linguistic level of affiliation: phonological or graphimatical, lexical or morphological, syntactic, and semantic (Stamatatos, 2009).

Kestemont et al. (2012) repeated feature subsampling and a group of impostor authors in order to attribute a newly discovered Latin text from antiquity (the Compendiosa expositio) to the well-known North African author Apuleius. Their results confirmed philological analyses, which place the text chronologically probably in the 2nd century CE, i.e., within Apuleius’ lifetime, and present a close inter-textual relationship between the Expositio and the works of Apuleius.

Language models (LMs) are the cornerstone in various areas of NLP, such as automatic translation (Schwenk et al., 2006; Luong et al., 2015) and grammatical error correction (Yannakoudakis et al., 2017). Regarding the task of authorship attribution, LMs have been applied in the study of Peng et al. (2003). A wide range of features has been applied to the task of authorship attribution, and many studies have concluded that the use of character n-grams - which are part of the phonological / graphimatical “stylistic features” - results in several cases efficient. Specifically, the study of Peng et al. (2003) used methods of estimating the performance of authorship attribution by statistical LMs. Language modeling can be performed at the word or the character level. Authorship attribution schemes can be performed with a word-level analysis (Peng et al., 2003). However (Peng et al., 2003) also note that character-level statistical LMs perform better than the respective word-level models. Furthermore, in several languages, such as Asian (e.g., Japanese, Chinese, etc.), word limits are not clear and therefore difficult to identify in the text (Peng et al., 2003). Character-level statistical LMs find application in every language, even in non-linguistic sequences such as music (Doraisamy and Rüger, 2004) and DNA (Doraisamy & Rüger, 2004). At the same time they are widely used in data mining applications (Witten et al., 1999).

Regarding studies determining the performance of authorship attribution, it is worth mentioning those concerning the period of classical antiquity. One such case is a study of author verification through Vector Space Models (Salton et al., 1975) in the work Commentarii de Bello Gallico by Julius Caesar (100-44 BC). While the work is traditionally attributed to Caesar, it is speculated that Aulus Hirtius, one of his most trusted generals, contributed significantly to the writing of the corpus (Kestemont et al., 2016). Another notable case is the authorship attribution study of (Manousakis & Stamatatos, 2018) in the tragedy Rhesus (453 BC). The authors trained Support Vector Machines (SVM) (Hearst et al., 1998) with character trigrams and fourgrams on a corpus comprising Aeschylean, Sophoclean, Euripidean and Aristophanean dramas, from the 5th to the 4th centuries BCE. The model was trained to detect the correct author and, in order to capture in-play variability, multiple segments were used per play as input. Their model managed to attribute correctly all the tested plays with no error. Interestingly, the authors also included the 24 Iliad books in their training set, based on the hypothesis that Rhesus is partly a dramatization of the 10th Book of the Iliad. However, they found no affinity with the Iliad text. Another outstanding case is the study of computational authorship attribution on two Latin works by Monk of Lido and Gallus Anonymous (Kabala, 2020). Kabala (2020) used Bray–Curtis distance and logistic regression achieving a classification accuracy of 99.86%.

3 Methodology

Our work employs language modeling as a means to analyze the Iliad and the Odyssey. We trained a statistical character-level language model on one part of the poem (Iliad or Odyssey) and computed its average predictive power on a fragment from the remaining part, which was used as a measure of linguistic proximity between the (unseen) fragment and the (seen) poem.Footnote 3

3.1 Statistical language modeling

Character-level statistical language modeling (Jelinek, 1997; Jurafsky and Martin, 2000) is based on the Markov assumption, under which the probability of the next character is computed given only a fixed number of preceding characters. The counts of all sequences of n characters (n-grams) are calculated over a corpus. Using a part of the preface of the first Book of Iliad as an example, the trigrams (n = 3) of: ‘ ’ will be: , followed by , followed by , and so on until the final . Then, a probability distribution is modelled over all the possible characters that following each sequence of n − 1 characters (k-gram, with k = n − 1) observed in the corpus:

$$P(c_{i}; c_{i-1}, \dots, c_{1}) = P(c_{i}; c_{i-1},\dots,c_{i-n+1})$$

A 2-gram (bigram) model will only consider the previous character ci− 1 to predict a next character ci. And ci will be the most frequently occurring character in the corpus following ci− 1. Probabilities are formed using the maximum likelihood estimation, changing Eq. 1 to:

$$P(c_{i}; c_{i-1} ) = \frac{C(c_{i-1} c_{i})}{C_{i-1}}$$

where C are the counts of the gram. In this work we used trigrams (n = 3), which showed to achieve the best results for n = 2,3,4 in preliminary experiments.

Unseen sequences of characters can be addressed with smoothing techniques (Jurafsky & Martin, 2000). However, smoothing is ineffective in authorship attribution with statistical language modeling (Peng et al., 2003) while out-of-vocabulary issues are expected to play much less of a role with character- than with word-level language modeling. Hence, no smoothing was added to our models in this work.

3.2 Evaluation measures

Provided with a language model that is trained on a corpus, Perplexity (PPL) is the inverse probability of a sequence, normalized by the number of items in the sequence (Jurafsky & Martin, 2000). For a sequence of N words, PPL of a bigram word-based language model is defined as:

$$PPL(S) = ({\prod\limits_{i}^{N}}{\frac{1}{P(c_{i}; c_{i-1})}})^{-\frac{1}{N}}$$

PPL is most often used as an indicator of how well a language model predicts an excerpt (Goodfellow et al., 2016). This measure, however, is most often reported for word-level language models. Perplexity for character-level language can also be defined, based however on Bits Per Character (Hwang & Sung, 2017), an equivalent in spirit measure to PPL (Dror et al., 2020). BPC is also known as average cross entropy and (Graves, 2013) defines it as the average negative log probability of the characters of the fragment under investigation. In other words, for a given timestep t of a character sequence \(c_{1} {\dots } c_{N}\), the BPC of a model m is \(-log_{2}(P(c_{t+1}; m, c_{1} {\dots } c_{t}))\). Hence, BPC encodes the negative log probability that m would generate ct+ 1 if the preceding characters were c1...ct. Consequently, BPC for the whole sequence can be defined as:

$$BPC(S) = -\frac{1}{N}{\sum\limits_{1}^{N}}{log_{2}(P(c_{t+1}; m, c_{1} {\dots} c_{t}))}$$

As suggested by Graves (2013), Perplexity for a character-level language model can then be defined as:

$$PPL(S) = 2^{\lvert\overline{w}\lvert*BPC(S)} = 2^{-\frac{\lvert\overline{w}\lvert}{N}{{\sum}_{1}^{N}}{log_{2}(P(c_{t+1}; m, c_{1} {\dots} c_{t}))}}$$

where \(\lvert \overline {{w}}\rvert\) represents the average number of characters per word in the corpus.

In this work we employ PPL, as defined in Eq. 5, with lower values indicating a higher linguistic proximity between an unseen excerpt and the corpus that the model was trained.

3.3 Statistical significance

To measure the statistical significance of our measurements we applied a non-parametric statistical significance test. After sampling control fragments Ccntrl from Iliad (Odyssey), we train a statistical character language model on the remaining Iliad (Odyssey) and we compute the PPL scores of the model when it is applied on randomly-sampled text fragments cC. We also compute the PPL scores of the model when it is applied on equally sized fragments from an unseen text Cnew under investigation. A Wilcoxon signed-rank paired test (Wilcoxon, 1992) is then calculated between the two measurements, investigating the (null) hypothesis that the latter scores are no higher than the former.

4 Empirical analysis

By using character-level statistical language models (LMs; see Section 3.1), we analyzed the linguistic proximity between each Iliad/Odyssey book and the rest, as well as the pairwise book correlation, with regards to their association with the rest of the books per poem. Prior to this analysis, provided in Section 4.3, and in order to explore the potential of our models in understanding Homeric language, we performed a text classification and an authorship verification task. Text classification was performed by training one language model on the Iliad and one on the Odyssey, and then comparing their predictions on unseen excerpts taken from the two poems. The resulting system classified each test fragment as belonging to one or to the other poem, based on the lowest PPL (5). The classifier’s predictions were then compared against the responses of human experts, following a survey-based approach. Authorship Verification was performed by comparing the PPL of our Homeric LMs when they were presented excerpts from Hesiod’s ‘Theogony’ and ‘Works and Days’,Footnote 4 with that achieved when they were presented ones from the Iliad and the Odyssey.

4.1 Poem classification

We provided 23 students and graduates of (Greek) philology departments with a questionnaire.Footnote 5 Out of all who volunteered, five were undergraduate students of philology, three were graduates of philology, ten were postgraduate students with Bachelor in Philology, and five were teachers of philology in secondary education.Footnote 6 We asked the volunteers to classify excerpts attributed to the epic poet Homer correctly. Excerpts of the questionnaire ranged from 580 to 610 characters. Six verses were used from the Iliad and six from the Odyssey, carefully selected so that their content does not give away their provenance. To diffuse the dual thematic content (Iliad/Odyssey) and to enhance thematic variety we also selected and added eight excerpts from the so-called Homeric hymns, vis. two from “To Demeter”, two from “To Apollo”, two from “To Hermes” and two from “To Aphrodite”. The addition of these eight excerpts also aimed to discourage human annotators from reverting to content instead of language for their classification. No search for sources, such as in books or on the Internet, was allowed.

Fig. 1
figure 1

Example question of the questionnaire. The given excerpt is taken from the Homeric hymn “To Demeter”

As can be seen in Fig. 1, for each excerpt the annotators could choose one from the following options:

  • yes (Iliad), if they considered that the given excerpt belongs to Iliad;

  • yes (Odyssey), if they considered that the given excerpt belongs to Odyssey;

  • yes if they considered that the given excerpt belongs to the Homeric Poems, but they were not able to determine which of the two poems (Iliad/Odyssey) it belonged to;

  • no, if they considered that the given excerpt does not belong to the Homeric Poems (Iliad/Odyssey), but they considered that it could be attributed to the poet Homer due to linguistic, mythological, historical or other elements that led them to this answer;

  • other, if they considered that the excerpt does not belong to any of the above options.

Fig. 2
figure 2

Accuracy (higher is better) of human experts (in blue) and HLMTC (in red) on Iliad/Odyssey classification of the respective questionnaire excerpts

The predictions of the human experts were compared with those of a text classifier that was based on two Homeric LMs, dubbed HLMTC. For each of the 12 Iliad and Odyssey excerpts that were included in the questionnaire, two PPL scores were computed, one from an Iliad and one from an Odyssey LM. The poem with the lowest score was used to label the excerpt. Books, excerpts of which were included in the questionnaire, were removed from the training data of the LMs. More specifically, Iliad Books 3, 8, 10, 11, 15, and 21 were removed. Similarly, Odyssey Books 3, 6, 10, 13, 18 and 21 were removed. Figure 2 presents the accuracy of all human experts (in blue bars) along with the performance of HLMTC (in red). HLMTC failed to classify correctly two out of the 12 excerpts, achieving an accuracy of 83%. By contrast, most of the human experts scored lower on accuracy, some with a significant difference.

4.2 Authorship verification

4.2.1 Genesis A or B?

Following the work of Neidorf et al. (2019), we used Genesis, one of the longest extant Old English poems, which is also known to be the work of multiple authors. A later poem, called Genesis B (30,406 characters long) is embedded within the approx. 2300 lines of the older main poem (Genesis A; 97,628 characters). We trained 100 LMs on Genesis A and we compute 100 PPL scores on randomly sampled (unseen during training) excerpts of Genesis A and 100 PPL scores on equally sized randomly sampled excerpts of Genesis B. The unseen excerpts are 600 characters long. To create each training dataset, we merged the text on the left with the text on the right of the respective unseen excerpt, by ignoring 6000 characters from the left and the same number from the right of the excerpt. A Wilcoxon test (Wilcoxon, 1992) then rejected the null hypothesis that Genesis A and B are indistinguishable with a value of 0.009 (a = 0.05). In other words, language models trained on Genesis A can tell if an unseen excerpt is from Genesis A or B.

4.2.2 Homer or Hesiod?

In a similar experiment, we investigated the hypothesis that a Homeric LM can distinguish an unseen excerpt composed by Homer, even when it is presented excerpts by poets of a similar period and genre. Following the same experimental settings, we trained LMs on the Iliad and the Odyssey, which we used to score (unseen) excerpts (a) from the Iliad and the Odyssey and (b) from Hesiod’s ‘Works and Days’ (36,203 characters long) and ‘Theogony’ (45,813 characters). We observe that both Iliad and Odyssey LMs can distinguish parts from Hesiod’s works with a statistical significance (Table 1; P values below 0.05). Given the lack of other contemporary literature, Hesiod’s works are probably as close a text as we have today.Footnote 7

Table 1 P values of Wilcoxon test when training character language models on Iliad and Odyssey (vertically) and computing BPC scores on Hesiod’s work (horizontally)

4.3 Authorship analysis

Fig. 3
figure 3

Average perplexity scores (PPL; vertically) measured using the 24 language models per Homeric Poem. Each model is trained on all the books but one, and PPL is computed on 1000 excerpts taken from the remaining book (x-axis) 

The promising aforementioned results motivated our LM-based analysis, following described.

4.3.1 Leave one book out language modeling

We developed 24 language models per poem, using excerpts from all but one book of the same poem. For each such model, then, we report the average PPL score computed on one thousand samples of 600 characters each, taken from the remaining book. We only used the text between the 2600 and the 9600 characters per book, to train each language model, in order to control the varying length of the books. Also, we excluded the beginning of each book (as well as the end) to avoid any sampling bias and to soften the effects from any arbitrary book division.

Figure 6a and b present the average PPL scores for the Iliad and the Odyssey respectively, which can be used to approximate the overall linguistic proximity of any book with the rest per poem. In the Iliad, the greatest proximity (least PPL) was observed for Books 11 and 17. The least proximity (highest PPL) was observed for the 4th and 9th Book. In the Odyssey, the 9th and 12th books were the ones least linguistically associated while the 1st and the 16th were the ones most strongly associated with the remaining books.

4.3.2 Book language modeling

We trained one LM per book and for each other book in the respective poem we computed a PPL score (average across ten 600-character-long samples). By employing Pearson’s rho correlation, Fig. 4 presents two heatmaps, one per poem. Most of the book pairs are positively correlated (red squares), but exceptions are present. In Fig. 4a, for example, the 5th Book of the Iliad has a negative correlation with the 9th Book (9th row from the top, 5th column) while the 1st, the 9th and the 24th Book are negatively correlated with multiple books. In Fig. 4b, we observe a more interesting pattern. The 9th, 10th, and 12th books are negatively correlated with most other books (most cells in blue) except from each other (the red box in the 10th row) while the 11th Book is weakly positively correlated with the 10th and less (or not) correlated with all other.

Fig. 4
figure 4

Correlation (Pearson’s rho) heatmaps produced by training one language model per book and computing the PPL scores resulted by applying the model on all the poem’s books

5 Discussion

Despite any obvious shortcomings, the books (rhapsodies) initially served as the obvious structural unit for our study. Addressing suspicions raised by scholars for centuries that some (or large parts) of the books were interpolations, we evaluated the linguistic proximity of each book with the rest of the poem that it comes from. The assessment of the poems on the basis of the book was by no means void of interesting results. The two heatmaps of Fig. 4 show how any two books of the same poem correlate to each other, with regards to their relation to the rest of the poem’s books. The first impression is that the Iliad is a less uniform work compared to the Odyssey. This impression stems from fewer strongly positive correlations (designated red) not as densely concentrated compared to Odyssey. In the latter, the majority of relations are positive (designated red), and they are so by a greater degree (dark red). Overall, in both poems positive correlation between books outnumber negative ones.

Four Iliad books present the highest number of negative correlations with the rest, vis. I.1, I.2, I.9, and I.24 (see Fig. 4). Of these, I.2 appears to break away from the rest, as its negative correlation to the 20 remaining books is not complemented with a distinctively positive one to I.1, I.9, and I.24. These three books are not only strongly correlated to each other, but they also associate negatively with the same books (most notably with I.12 and I.13). I.9 only slightly falls behind the other two as far as the degree of both positive and negative correlations are concerned. Thus, it seems reasonable to argue for a closer relationship among Books 1, 9, and 24, but there is no case for Book 2 joining the group. The separation of I.2 from most other books (relatively weak positive correlations are with Books 1, 9, 10, and 14, but negative with most of the rest) can hardly be irrelevant to a sizeable passage from the catalogue of ships that it contains, which was highlighted by our models (with a PPL above the top 95%) as stylistically diverge. We discuss this passage in Section 5.1, but we expect that further experiments on this particular section of I.2 might offer additional evidence regarding the correlation between parts of I.2 and the other books.

The Odyssey exhibits an even clearer pattern of interrelations between its books. Books 9, 10, and 12 not only demonstrate among them very strongly positive correlations, but they also exhibit consistently negative correlations with all the other books of the Odyssey (and only marginally positive with O.5). These three books, therefore, appear to be the most substantial portion of text diverging linguistically from the rest of the Odyssey. We cannot disregard the fact that Book 11 (which is the only bit interrupting the continuity of the Books 9-12 group) is also linguistically at odds with many of the other books, although it shows no particular affinity with Books 9 and 10 and only narrowly a positive correlation with O.12. In fact, these four books, 9-12, contain a very distinct part of the Odyssey: the narration of Odysseus’ adventures during his journey. The preceding eight books describe Telemachus’, his son’s, quest to find his father and Odysseus’ arrival at Ogygia (where he is stationed when narrating his quest in Books 9–12). In Book 13, Odysseus leaves Ogygia and reaches Ithaca, where the rest of the story takes place. Thus, the portion of the poem contained in Books 9–12 is somehow a self-contained (if not independent) part of Odyssey’s story. Book 11, discussed above, which deviates somewhat from the pattern set by the other three, is a particular part of Odyssey’s story, because it takes place almost exclusively in Hades, the underworld, where Odysseus engages in conversations with several dead characters. Here too our models identified a substantial passage of approximately 100 verses (discussed below in Section 5.1), which may be affecting the results shown in our heatmap (Fig. 4).

Measuring PPL per book produced results with considerable fluctuation among evaluated excerpts from the same books (Fig. 3). This wide variation dictated our decision to break the evaluated passages down to 600 characters. Although ancient in origin, the division of both the Iliad and the Odyssey in books is substantially later than their original composition and it does not reflect natural breaks in terms of content and, as a result, perhaps also of language (p. 57-59 ; Haslam, 1997). It seems rather clear, though, that they were not originally meant as independent units/sections. For the purpose of our research, the considerable length of all 48 books is also of concern, since it is unlikely that any of the books in its entirety represents a single distinct poetic tradition or interpolation. Evidence for these have been sought for by scholars in passages as short as a few words. Therefore, if we seek to identify linguistic oddities, we need to examine shorter units.

5.1 Passages isolated by the models

Fig. 5
figure 5

Perplexity (vertically) measured over a rolling window of 600 characters (horizontally), using a language model trained on the rest of the poem. The red line depicts the upper 95% confidence interval of perplexity (PPL)

We trained one character-level statistical language model for each 600-character excerpt in our poem, by using the remainder of the poem as our training set and the excerpt to compute a PPL score. Figure 5 presents the resulted two PPL time series, from the first to the last excerpt of each poem. A higher PPL score indicates that the passage is linguistically remote from the rest of the poem, or possibly even the work of a different author. We excluded 60,000 characters from each side of the fragment,Footnote 8 in order to make the excerpt’s prediction task more challenging. Confidence intervals 95%, the upper one of which is shown in red, were computed per poem by randomly sampling 1000 fragments of 600 characters each, then computing PPL with a model trained on the remainder (minus the same 12,000-character-long window).

Forty-seven such excerpts of 600 characters each were scored above the upper 95% among all other excerpts.Footnote 9 Twenty-five were from the Iliad and 22 from the Odyssey. In most cases, these excerpts are picked out by the models independently, with the immediately preceding and following excerpts not making the 95% threshold. This implies that the stylistically deviant text identified is relatively short. In some cases, however, the models pointed out groups of consecutive (or almost consecutive) excerpts. Fifteen out of the 25 Iliadic excerpts come from only two books: ten from Book 2 (mostly consecutive) and five from Book 18 (in two groups of consecutive passages and one solitary excerpt). In the Odyssey it appears that more books are represented by two or more excerpts: three from Book 5, another three from Book 9, six from Book 11 and three from Book 12. In most cases, however, the passages are remote from each other or, at best, consisting of up to two consecutive 600-character excerpts. The striking exception comes from Book 11, from which six consecutive excerpts were picked out by the model.Footnote 10

The identification of these two lengthier passages touches upon well-known debates: the authenticity of The Catalogue of Ships (I.2) and that of The Book of the Dead (0.11), both discussed below. The PPL per excerpt of each book is shown on Fig. 6.

Fig. 6
figure 6

Perplexity (vertically) of 600-character long excerpts taken from I.2 and O.11 (horizontally), using a language model trained on the rest of the respective poem. The red line shows the model’s (Control) PPL on an unseen 600-character-long excerpt taken from the respective poem

5.1.1 The catalogue of ships (I.2)

Book 2 of the Iliad is known as “The Great Gathering of Armies”, which includes the famous catalogue of ships (2.494–759). From I.2 ten excerpts scored a PPL higher than the upper 95% confidence interval. Two of these excerpts (consecutive) come from the earlier parts of the book (Odysseus’ speech at the assembly) and eight of them (not consecutive) come from the actual catalogue of ships. None comes from the catalogue of the Trojan forces at the end of the book. The whole catalogue has been contested since antiquity regarding its authenticity. It is in fact one of the most disputed parts of the Iliad, with many theories attempting to explain geographical and chronological discrepancies.Footnote 11 Since our models did not mark out the entirety of the catalogue, we assessed the PPL scores for the rest of the book (Fig. 6a) and found that almost all the 600-character excerpts from the catalogue of the ships (beginning from verse 2.494)Footnote 12 score consistently above the model’s control PPL all the way up to the end of the passage (2.759), dropping immediately after that. For the sake of completeness, a breakdown of the passages shows that our models consider more linguistically distant four particular passages. The first spans from the beginning of the catalogue (2.494) up to, but excluding, the Athenians (2.546); the second passage refers to the Messenians (2.589-603); the third one includes the Cephallenians, the Aetolians and the Cretans (2.632-646); the last group consists of two excerpts from the last part of the catalogue (some places of origin not named), namely verses 2.718-733 and 2.748-762, which marks the end of the catalogue. We need to stress, however, that many of the in-between sections of the above passages also scored a relatively high PPL and diverge from the rest of the poem.

5.1.2 The Book of the Dead (O.11)

This is the single longest uninterrupted portion of text identified by the models as linguistically distant from the rest of the poem. It consists of 101 verses almost uninterrupted (11.230-331) and it coincides almost exactly with the introduction to yet another catalogue, the so-called Catalogue of Heroines (11.225-332). This particular passage is deemed an interpolation by many scholars in contrast to the actual Catalogue of Heroines, which is not disputed.Footnote 13 The other excerpt from Book 11 set aside by the model is also part of a passage considered by Cassio (2002) to be interpolation (verses 574-588).

Like with the catalogue of ships, here too we assessed the PPL of all the excerpts from O.11 (Fig. 6b). Almost the entirety of the book scores higher than the model’s control PPL. But the peaks observed between passages 17 and 22 correspond to the introduction to the Catalogue of Heroines. Another single peak at the end of the book marks Odysseus’ meeting with Hercules and the description of the latter’s belt (11.601-615). A third instance of two peaks at the beginning of the book corresponds to 11.59-100, which includes part of Odysseus’ meetings with Elpenor and Teiresias.

6 Conclusions

To perform any form of authorship analysis on the Iliad and the Odyssey, we need to verify that our character-level statistical language models (LMs) recognize them (at least in their most part) as distinct works of literature. In this work, the Iliad and the Odyssey were examined and evaluated both internally (i.e., examining the relations among their structural parts/books for each work), and against other similar texts.

By performing an authorship verification task we showed that our LMs can adequately differentiate between each of the Homeric poems and two of the most similar to them works available: Hesiod’s Theogony and Works and Days. In a different experiment, we established that our models can be used to classify excerpts as belonging to the Iliad or the Odyssey. Students and scholars of Greek literature were asked to complete a questionnaire and their responses were compared against our simple in nature LM-based classifier, showing that the latter performs better than most annotators and comparably to the best.

We then used our LMs to analyze Homeric books and fragments. Based on the former, we showed that I.1, I.7, O.1 and O.16 have the most linguistic proximity while I.4, I.9, O.9, and O.12 have the least linguistic proximity with the rest of the books of the respective poem. Also, we showed that strong correlations exist between books, with regards to their relations to other books. By analyzing these correlations we observed that O.9, O.10, and O.12 are strongly associated with each other while least associated with the rest. Moving away from the book level, we studied the linguistic proximity of fragments with the rest of each poem, showing that passages such as the catalogue of ships (I.2) and the Book of the Dead (0.11) appear to have a low linguistic proximity with the remainder of the poem.

Future work will analyze marked out passages at the verse level. Also, we will take into account formulas and repetitions traced back to oral traditions and we will examine textual matters, such as the “plus verses” in ancient manuscripts, attempting to revisit existing debates with a computational approach.