Western classical music development: a statistical analysis of composers similarity, differentiation and evolution

This paper proposes a statistical analysis that captures similarities and differences between classical music composers with the eventual aim to understand why particular composers ‘sound’ different even if their ‘lineages’ (influences network) are similar or why they ‘sound’ alike if their ‘lineages’ are different. In order to do this we use statistical methods and measures of association or similarity (based on presence/absence of traits such as specific ‘ecological’ characteristics and personal musical influences) that have been developed in biosystematics, scientometrics, and bibliographic coupling. This paper also represents a first step towards a more ambitious goal of developing an evolutionary model of Western classical music.


Introduction
This paper has two objectives. First, the paper contributes to the music information retrieval literature by establishing similarities between classical music composers. 1 That two composers, or their music, 'sound alike' or 'sound different' is inherently a subjective statement, made by a listener, which depends on many factors, including the degree of familiarity to classical music per se. 2 This paper addresses the subjective issue, using wellestablished similarity indices (e.g., the centralised cosine similarity measure) based on measurable criteria. Even if no audio file is used in the analysis, 'sounding alike' is used in this paper as a proxy (or shortcut) with the specific meaning that the music of two composers is similar in ecological/musical characteristics and/or personal musical influences (as defined below). Uncovering what makes two composers similar, in a systematic way, has important economic implications for (1) the music information retrieval business; (2) a deeper insight into musical product definition and choice offered to music consumers and purchasers and (3) for our understanding of innovation in the creative industry.
This leads to a second objective of the paper, which is to propose a statistical framework that could identify transitional figures, innovators and followers in the development of Western classical music. Western classical music evolved gradually, branching out over time and throwing off many new styles. This overall development is not due to simple creative genius alone, but to the influence of past masters and genres, as constrained or facilitated by the cultural conditions of time and place. Figure 1 conveys this development and proposes a (narrow) historical time line for music periods (e.g., Medieval, Renaissance, Baroque, Classical, Romantic and Modern/twentieth century) and some composers belonging to these periods. 3 Along vertical lines are composers who have developed and perfected (or pushed to the limit) the musical style of their period. Others composers (not necessarily shown), may gravitate around them, extending the volume of music production in an essentially imitative style. Along the diagonal line are some 'transitional' and/or 'innovative' composers whose works (or at least some of them) have been assessed by musicologists to contribute to a transition from one period to another. 4 As claimed by Gatherer (1997), ''a dialectical approach to music evolution would seek to identify the internal stylistic tensions and contradictions (in terms of thesis and antithesis) which give rise to new musical forms (synthesis).' ' Franz Brendel (1811-68), a doctor of philosophy, is the first self-consciously Hegelian historian of music and, according to Taruskin and Gibbs (2013), henceforth T&G (2013), his great achievement was to write the nineteenth century's most widely disseminated history of music. 5 Brendel casts his narrative in terms of successive emancipations of composers and the art of music (emancipation from the sacred, emancipation from words, etc.). For T&G (2013), through this Hegelian approach, ''many people have believed that the history of music has a purpose and that the primary obligation of musicians is not to meet the needs of their immediate audience, but, rather, to help fulfill that purpose-namely, the furthering of the evolutionary progress of the art. This means that one is morally bound to serve the impersonal aims of history, an idea that has been one of the most powerful motivating forces and one of the most demanding criteria of value in the history of music. (…). With this development came the related views that the future of the arts was visible to a select few and that the opinion of others did not matter.'' This Hegelian perspective claims to show why things changed. This makes it fundamentally different from Darwin's theory of biological evolution based on random mutation. Change or evolution in the Hegelian approach is viewed as having a purpose, which turns random process into a law.
For Gatherer (1997), ''a Darwinian alternative to dialectics, which in its most reductionist form is known as memetics, seeks to interpret the evolution of music by examining the adaptiveness of its various component parts in the selective environment of culture.'' The diagonal in Fig. 1 (and the identified composers along the diagonal) could represent a somewhat lengthy process of music 'speciation' so to speak (in analogy to evolutionary biology). 6 Darwinian biological models have been applied to many aspects of cultural evolution (see Linquist 2010 for one good survey), but not so much to music (see, however, Gatherer 1997;Jan 2007). An evolutionary approach to classical music could perhaps be narrated along the following lines. Music transmission is analogous to genetic transmission in that it can give rise to a form of evolution by selection. By planting a fertile 'meme' in another composer mind, the initial composer manipulates his brain, turning it into a vehicle for the meme's propagation. 7 Composition imitation is how musical memes can replicate. However, the inherited music style adapts to local ecological and social conditions by a process of musical mutation/variation and differential fitness that is akin to natural Footnote 4 continued Hasse, the dramma giocoso/opera buffa of Galuppi and Pergolesi, and the opera reforms of Gluck and Jommelli). The case for putting de Muris and de Vitry on the diagonal is perhaps worth mentioning. They were both mathematicians and musicians. According to T&G (2013), their treaties and the debates they sparkled, and their notational breakthroughs and innovations had enormous repercussions. ''So decisive were their contributions that this theoretical tradition has lent its name to an entire era''-Ars Nova (which is also the title of the Treatise of de Vitry). 5 The title of the book can be translated as: ''History of Music in Italy, Germany and France-From the Earliest Christian Times to the Present. See: https://archive.org/stream/geschichtedermus01bren#page/n5/ mode/2up. 6 This kind of approach is conceptually not much different from understandings of biology and phylogeny through the study of genetics, whereby one identifies the lineages and the way population splits over time into new species. 7 Defining the 'meme' replicator as a unit of cultural transmission or a unit of imitation, Dawkins (1976) suggests that '' [j]ust as genes propagate themselves in the gene pool by leaping from body to body via sperms or eggs, so memes propagate themselves in the meme pool by leaping from brain to brain via a process which, in the broad sense, can be called imitation. '' selection. 8 Just as not all genes that can replicate do so successively, so some music memes are more successful in the meme-pool than others, leading to a process of 'musical' (instead of natural) selection, a non-random survival of random musical mutations. In other terms, musical memes are passed on in an altered form, through musical mutation and speciation, branching out over time into many new and diverse styles.
This suggested, the present paper does not go deeply into any 'pseudo-scientific' metanarrative for Western classical music evolution. Rather, and more modestly, it proposes a statistical analysis that captures similarities and differences between classical music composers. The eventual aim is to increase our understanding of why particular composers 'sound' different even if their 'lineages' (or personal influences network) are similar, thereby contributing to an evolution in Western classical music. Musicologists and music historians have described and classified composers, the styles and the periods in which they lived. They have discussed the relationships and influences network of composers, the evolution of music styles, who they see as transitional figures, innovators, or followers. See for example the History of Western Music by T&G (2013), a History of Opera by Abbate and Parker (2012), Grout and Williams (2002) and many others. Typically, these authors use descriptive narratives and music manuscripts analyses. The objective of this paper is to complement these approaches by proposing a statistical analysis that captures similarity across pairs of composers by mean of pairwise comparison of presence-absence of traits such as personal musical influences and musical/ecological characteristics. To this end, we use an approach that is based on (but different from) the earlier contributions by Smith andGeorges (2014, 2015), using methods that have been developed in biosystematics, scientometrics, and bibliographic couplings.
The rest of the paper is as follows. The first section describes the data (influences network and ecological characteristics) and the methodology used in Smith andGeorges (2014, 2015). The second section shows how the interaction of personal musical influences and ecological characteristics can provide a typology that could, in theory, lead to some evolutionary model of Western classical music. The third section introduces the centralised cosine measure as a statistical measure of similarity between composers. 9 The fourth section discusses some statistical results and the last section concludes. Smith andGeorges (2014, 2015) used data collected in the 'The Classical Music Navigator' (Smith 2000; hereafter referred to as CMN). 10 One important part of the CMN is the presentation of composers' personal musical influences. Each of the 500 composers of the database is associated with a list of composers who have had a documented influence on a subject composer.  provide the following example in Fig. 2 which represents the network of influences on three composers, J. Haydn, W. A. Mozart, and Schubert, three Austrian composers, born respectively in 1732, 1756, and 1792, and who are typically associated with the 'Classical' period of Western classical music with Schubert also being a transitional composer between the Classical and Romantic periods. A casual listening to J. Haydn, W. A. Mozart, and Schubert suggests similarities across them, although to a majority of listeners J. Haydn and W. A. Mozart would probably sound 'closer' than J. Haydn and Schubert, or W. A. Mozart and Schubert. To overcome the subjectivity issue noted in the Introduction, Smith and Georges (2014) infer similarities among composers by assuming that if two composers share many of the same personal musical influences, their music will likely have some similarities. On the other hand, if two composers have been influenced by very distinct sets of composers, then their music is likely to have little similarity. Observe in Fig. 2 that these three subject composers share in common two particular influences: Handel and Gluck. There are no further common influences between Schubert and Haydn, but two additional common influences between Schubert and W. A. Mozart (M. Haydn and J. S. Bach) and five additional common influences between Haydn and Mozart. According to the assumption of Smith and Georges (2014), then, the larger number of common personal influences between J. Haydn and W. A. Mozart would cause (or even explain) the higher similarity between the music of these two composers than between Schubert and Mozart, let alone Schubert and J. Haydn. The third section confirms this with a methodology that generates similarity scores between any pair of composers, by means of pairwise comparison of presence-absence of personal musical influences, using the centralised cosine similarity measure. 11 A second collection of data in the CMN associates each of the 500 composers with characteristics such as time period, geographical location, school association,  (Smith 2000) instrumentation emphases, etc., and for convenience denoted 'ecological' categories.  have extracted 298 such ecological categories from the CMN. (See their paper for a complete list.) Thus, each composer is associated with a list of ecological categories, and the authors infer a statistical association between pairs of composers by assuming that if two composers share many ecological categories, then their musical 'ecological niches' are very similar, so that, in this sense, they may be considered similar. Figure 3 pursues the previous example for composers J. Haydn, W. A. Mozart, and Schubert and illustrates their musical ecological niches. 12 We see that Mozart and J. Haydn share a larger number of ecological characteristics than, say, J. Haydn and Schubert. The contention is that this would cause a stronger similarity in the music of W. A. Mozart and J. Haydn than in the music of Schubert and J. Haydn. As before, it is also possible to compute similarity scores between any pair of composers, by means of pairwise comparison of presence-absence of ecological categories, and this will be implemented in the third section using the centralised cosine similarity measure. By introducing ecological characteristics, the basic objective in  was to explore the robustness of their earlier (2014) similarity results based on personal musical influences. They further propose a final list combining the ecological and influences network databases to assess similarities, arguing that this should produce a general improvement in the similarity rankings.

Data and background information on composers' similarity
This new paper, however, proposes a different approach. First, a new measure of similarity, equipped with a statistical significance test, the 'centralised cosine measure' is used, instead of the binomial index of dispersion used in Smith andGeorges (2014, 2015).  Table 1. Source: Constructed by the author from raw data collected in 'The Classical Music Navigator' (Smith 2000), and reorganised Table 1 Ecological characteristics associated with J. Haydn, W. A. Mozart, and Schubert. Source: Assembled from raw data collected in 'The Classical Music Navigator' (Smith 2000), and reorganised The centralised cosine measure is based on earlier literature in scientometrics and bibliographic couplings. Second, instead of merely combining together personal musical influences and ecological characteristics (to produce an improvement in similarity rankings) as proposed in , this paper points out that some additional information can be gained when the two sets of similarity indices are compared, especially when they provide conflicting information, leading to interesting questions such as why particular composers sound different (e.g., composed in different ecological niches) even if they have been influenced by the same personal musical influences and why they sound similar (e.g., composed in similar ecological niches) even in the absence of a common set of personal musical influences. The next section therefore develops a typology that highlights conflicting or reinforcing results, based on the influences network and ecological characteristics approaches, in a framework somewhat reminiscent of a biological evolutionary model. Music evolution: a typology based on influences networks and ecological data Personal musical influences lead to a sort of lineage among composers. If two composers have been musically influenced by, roughly, the same list of composers, they share the same ''cultural gene'' pool. In this case I refer to them as 'Most Similarly-Influenced Composers'. Because of their common personal musical influences we might expect these composers to develop a roughly similar style of music and eventually to 'sound' similar. However, if they do not, this should lead to hypotheses as to why a pair of composers might have very similar personal influences and yet produce very different music. Therefore, we need a second set of data to help categorise the musical style of each composer, the ecological characteristics of music referred to in the previous section. I refer to a pair of composers sharing a large set of common ecological characteristics (and thus having very similar ecological niches) as 'Most Ecologically-Related Composers'. Table 2 illustrates the interaction between these two dimensions. If most similarly-influenced composers (on the basis of individual musical influences) are also most ecologically-related composers (on the basis of ecological data), then those composers are most similar (they share a very similar set of personal musical influences and a very similar set of ecological characteristics, that is, very similar ecological niches). In terms of Fig. 1, these composers are likely to be grouped into one of the vertical lines of the 'tree'. At the other extreme we have most dissimilar composers. In Fig. 1, it could be composers belonging to non-connected vertical lines representing very distinct musical periods and styles. But there are two other, perhaps more interesting, cases. First, why do composers produce music that 'sounds' different if they have the same lineage/personal musical influences? As mentioned in the Introduction, some composers may have developed a different music style through a process of 'musical' selection and 'speciation' whereby an inherited musical style adapts to local and social conditions through mutation/variation and differential fitness/competition that is akin to natural selection. If a subject composer is very similar to a series of other (contemporary) composers in terms of personal musical influences but at the same time Most dissimilar composers Adaptation: Music speciation and evolution b Figure 6a-p in the fourth section will provide a visual representation of the table for any 'subject' composer with respect to all other 499 composers of the CMN a Pairs of composers sounding alike despite lack of common lineage b Pairs of composers sounding different despite a common lineage mostly ecologically unrelated to them, then the music of this composer is likely to 'sound' different, to have evolved. In Table 2 this is represented as 'music speciation and evolution'. In Fig. 1, this would be represented by composers along the diagonal line (e.g., Gluck, Debussy, Schoenberg, etc.). The second interesting case is why particular composers 'sound' alike if their lineage is different? Two composers, although perhaps geographically distant, may have composed music that sounds alike because they belong to very similar musical ecological niches that lead to selection pressures to adapt and develop similar sounding forms, despite having a very different lineage, in a process that could be called musical 'convergent evolution'. See Table 2. In biology, one can identify convergent evolution wherein species that live in similar but geographically-distant habitats will experience similar selection pressures from their environment, causing these to evolve similar adaptations, or converge, coming to look and behave very much alike even when originating from very different lineages. 13 However, this possibility seems less likely in the case of Western classical music because the time frame is rather short and the spatial frame is small, so that 'convergence' may only play a rather minor role in the overall process of musical evolution. A simpler interpretation is that a composer, having little documented personal musical influences in common with another contemporary composer, and therefore being perhaps (although not necessarily) isolated in the network of composers, has nevertheless composed in an ecological niche reminiscent of the musical style of the other composer, producing music that sounds similar. By being imitators or followers, and perhaps not central to the musical scene, these composers contributed less to the evolution of the sound of Western classical music.

The centralised cosine measure as an index of association/similarity
This section describes the methodology used in this article to assess the relationship (association/similarity) between pairs of composers. The discussion is couched in terms of personal musical influences but the methodology related to ecological categories is analogous. I first describe how I have conceptually organised the CMN database. This description draws on earlier articles by Smith andGeorges (2014, 2015) and . Suppose the set C of all 500 composers (n = 500) who are included in the CMN. For any pair of composers (i, j) for i; j 2 C (among the n 9 n possible pairs), we are interested in capturing whether a composer k 2 C had a reported influence on both i and j, on i but not j, on j but not i, and on neither i nor j. Running this across all composers k for each pair (i, j) we eventually obtain the set I i of all personal influences on composer i, and the set I j of all personal influences on composer j. Also, for any pair (i, j), I i \ I j ¼ CI i;j is the set of composers k that have influenced both i and j; I i À I i \ I j ¼ I i;Àj is the set of composers k that have influenced i but not j; I j À I i \ I j ¼ I j;Ài is the set of composers k that have influenced j but not i and DI i;j ¼ I i;Àj [ I j;Ài is the set of composers k that have influenced either i or j but not both. From this we can produce a count table, given in Table 3, for any pair (i, j) that sums the elements (the number of composers) in each of the four sets CI i;j , I i;Àj , I j;Ài , and C À CI i;j À DI i;j , and from which similarity indices for all pairs of composers (i, j) can be computed on the basis of well-known formulas. 14 13 A common illustration of this convergent evolution is the parallel evolution taking place in Australian marsupials versus placental mammals elsewhere. 14 Dozens of measures of association have been studied in the biosystematics literature, such as the first and second Kulczynski coefficients (1927), the Jaccard coefficient (1901), the Dice coefficient (1945), the In what follows I focus on the 'centralised' cosine measure in part because (unlike many other indices) this measure can be used to judge the statistical significance of the association between two composers. 15 Although the centralised cosine formula is based on the concepts underlying Table 3, it is not a straightforward application and therefore, it requires a slightly more structured presentation in order to establish a connection with the table. Here, the discussion follows closely . The ordinary (non-centralised) cosine similarity measure (also known as the Salton's measure) is a statistic familiar to bibliometrics and scientometrics. The idea was mathematically formalized by Sen and Gan (1983) and later extended by Glänzel and Czerwon (1996) who also applied the methodology. As applied to the CMN database, consider each composer i as a n Â 1 vector in the space of all n composers in the database. If a composer k among the n composers was an influence on i, then the kth component of the vector corresponding to composer i is set equal to 1, otherwise it is set equal to 0. Therefore, with respect to all composers in the database, each composer i is represented by a Boolean vector of 0's and 1's. The cosine similarity measure for a pair of composers (i, j), each represented by their own Boolean vectors B i and B j , can then be computed as: where subscript k in B k,i indicates the kth component (of value 1 or 0) of vector B i . Thus, in essence, the cosine of the angle between the two vectors B i and B j gives a measure of association/similarity. The cosine similarity index ranges between 1 and 0, where 1 indicates that two composers are exactly identical and 0 indicates complete opposition. A value somewhere in the middle of the 0-1 range indicates degrees of independence of two composers. As discussed in , when all the vectors are Boolean vectors, the null distribution of the cosine similarity under the assumption of independence between two composers is unknown and has a nonzero mean; in order to derive a statistical test for the cosine measure, a centralised cosine measure was proposed (Giller 2012). The centralised cosine measure is the cosine measure computed on the centralised vectors, with respect to the mean (average) vectors. Assuming that: B i ¼ ð1=nÞ P n k¼1 B k;i and B j ¼ ð1=nÞ P n k¼1 B k;j , the centralised cosine measure is: Footnote 14 continued Simpson coefficient (1943), the binary distance coefficient (Sneath 1968), the binomial index of dispersion v 2 statistic (Potthoff and Whittinghill 1966), the Salton's measure (1987) or its equivalent, the cosine similarity measure discussed in scientometrics and bibliographic coupling literature (Sen and Gan 1983;Glänzel and Czerwon 1996). 15 The ''Appendix'' provides a comparison between the centralised cosine measure and another well-known measure, the binomial index of dispersion, that was used in Smith andGeorges (2014, 2015).
In order to establish a connection between this formula and the elements in Table 3, I now use a result in  who proved that the centralised cosine measure can be computed as: where a, b, c, d are the count of composers in the sets CI i;j , I i;Àj , I j;Ài , and C À CI i;j À DI i;j described above, and reported in Table 3. It can be shown that values of the centralised cosine measure range from -1.0 to 1.0. A value of 1.0 indicates that two composers are identical. A value of -1.0 indicates that two composers are complete opposite. A value of 0 shows that two composers are independent (unassociated). A nonzero value of the centralised cosine measure might be due to randomness or actual association between composers. Unlike in the case of the ordinary cosine measure, there is a proper statistical significance test. Under the assumption that the size of the database n is large enough, the distribution of the centralised cosine measure (under the assumption of independence) is approximately normal, with mean 0 and variance 1/n. Therefore, the distribution of the centralised cosine measure can be converted into a standard normal distribution using the Z-score/statistics: where ABS is the absolute value and n is the size of the database at hand, that is n = 500 for the personal musical influences database and n = 298 for the ecological categories database. 16 Using the centralised cosine measure, Table 4 ranks composers in order of greater similarity to Debussy, on the basis of personal musical influences. The index identifies Ravel as the composer most similar to Debussy. The centralised cosine measure for Debussy and Ravel is 0.587. The corresponding Z-statistic is 13.119, which is greater than the critical value of 1.96 at a 5% significance level under the standard normal distribution. We can then reject the null hypothesis of no association between Debussy and Ravel. 17 As said above, when CSC takes a value of 0, this means that the two composers under consideration are 'independent' (unassociated). So, a negative value for CSC suggests that the composers are negatively associated. But what is the exact meaning of this? Recall that the centralised cosine measure is based on Boolean vectors. The Boolean vector for Debussy, B i = B Debussy , is a (500 9 1) vector of components B k,Debussy each equal to '1' or 16 Note that square root of 1 is ±1, which is why we take the absolute value, ABS.
17 Table 4 indicates that we can reject the null hypothesis of no association between Debussy and the first 181 composers of the table (until and including Monk in Table 4) at a 5% significance level. Table 4 also shows results for the binomial index of dispersion discussed in the ''Appendix''. Note that as shown in last column of the table, the binomial statistic for Debussy and Ravel is 172.1. Using the v 2 distribution, the critical value at a 5% significance level is 3.84. (For significance levels at 1 or 10%, the critical values are 6.63 and 2.70, respectively.) Because 172.1 [ 3.84, we reject the null hypothesis of no association between Debussy and Ravel in favor of the alternative that these two composers are statistically significantly associated (in agreement with the conclusion drawn from the CSC index). Observe that we can reject the null hypothesis of no association with Debussy for the first 181 composers of the table: this is the same cutoff point for both the v 2 test (binomial index) and the Z-statistic (CSC index). '0' depending on whether a composer k 2 C had or not a reported musical influence on Debussy. The Boolean vector for Carter follows an analogous definition. If the sets of personal musical influences on Debussy and Carter are such that B k,Carter is more often 1 (or 0) when B k,Debussy is 0 (or 1), then CSC will take a negative value and this suggests that Carter may have (deliberately or not) rejected composers that had a musical influence on Debussy while being influenced by others that had no reported musical influence on Debussy. This property of the centralised cosine measure provides a more sensitive measure of 'similarity' than the binomial index described in the ''Appendix'' (and previously used by Smith andGeorges 2014, 2015) as it also tracks composers who (consciously or not) attempted to 'differentiate' themselves from others. 18 For all 500 subject composers, two tables of similarity indices have been generated, one on the basis of the personal musical influences database (as done in the example for Debussy), and one that is based on the 298 ecological characteristics database. The large number of indices computed (2 Â 500 Â 500) forces us to report average results for subsets of composers and specific results for a few composers only. Before doing this in next section, observe Figs. 4 and 5. Figure 4 gives the ten most similar composers to J. Haydn, Mozart, and Schubert, on the basis of personal musical influences using the centralised cosine similarity measure developed in this section. Observe the differences between Figs. 2 and 4. Figure 2 provides composers who had a reported influence on these three subject composers. The assumption in the first section was that the larger number of common personal influences between W. A. Mozart and J. Haydn The number in front of a composer's name gives his ranking (in terms of importance), as defined in the CMN. This is the primary ranking discussed in next section would cause (or even explain) the higher similarity of styles between these two composers than between Mozart and Schubert, let alone J. Haydn and Schubert. Figure 4 confirms that J. Haydn and Mozart have a higher centralised cosine similarity index (0.52) than Mozart and Schubert (0.36) or Haydn and Schubert (0.26). 19 Figure 5 gives the 10 most similar composers to J. Haydn, W. A. Mozart, and Schubert on the basis of ecological characteristics. Two things are worth noticing. First, when comparing similarities on the basis of personal musical influences and ecological data there are only three common names in the two lists of the 10 most similar composers to J. Haydn (i.e., Mozart, Beethoven, Boccherini), three common names in the lists for Schubert (i.e., Rossini, Mendelssohn, Bruckner), and five common names in the two lists related to Mozart (J. Haydn, JC Bach, Salieri, Schubert, Beethoven). This is not surprising because personal musical influences and ecological data provide two different perspectives on the concept of similarity. Second, observe that most composers similar to Mozart and to Haydn are, in both lists, Classical period composers. However, many composers similar to Schubert on the basis of ecological characteristics are Romantic period composers (R. Schumann, C. Franck, Grieg, Fauré, Mahler-all composers born (2) The number on the edge linking any pair of composers gives the centralised cosine similarity index (on the basis of personal musical influences) between the two composers. Note that the width of the edge also proxies the degree of similarity quite after Schubert). Yet, the similarity list based on personal musical influences (lineage) suggests that Schubert is strongly associated to older composers of the Classical period (e.g., Reicha, Salieri, Carulli, Méhul, and Rossini). This confirms the insight of the previous section-Exploiting the conflicting results generated by the two databases is a useful approach to detect transitional-period composers such as Schubert, whose lineage is still anchored in the Classical period while his musical ecological niche pulls him towards the Romantic period. 20 This explains to some extent music 'speciation' and evolution-a large number of Schubert's compositions 'sound' different from the music of Mozart and Haydn, even if Schubert's influences network (lineage) remains anchored in the Classical period. This also suggests that a presentation analogous to Table 2 could help us detect music speciation and evolution. This is explored further in the following section. (2) The number on the edge linking any pair of composers gives the centralised cosine similarity index (on the basis of ecological characteristics) between the two composers. Note that the width of the edge also proxies the degree of similarity 20 Figures 4 and 5 also give the birth date of each composer (in front of the name) so that we can compute the sum of the age differentials between all composers similar to Schubert and Schubert himself. The sum is -49 years in the influences network case, and ?161 years in the ecological database case. This clearly indicates that while the personal influences network associates Schubert with composers relatively older than him, the ecological database associates him with much younger composers. The same calculations for Mozart provide sums of age differentials of -40 years and -145 years, respectively, demonstrating that the ecological niche of Mozart was rather backward-looking. For Haydn, we get ?148 years and ?43 years, respectively. Although Haydn's musical ecological niche is clearly forward-looking (as he is typically associated with innovations in Symphonic and String Quartets compositions), Haydn is also forward-looking with respect to his influences network. From this perspective, his ecological niche is in concordance with his influences network, as in the case of Mozart. This is not the case for Schubert.

Selected statistical results and discussion
Built from the perspective of a 'subject' composer, Fig. 6a-p plot vectors (dots) representing other composers located relative to the 'subject' composer according to their similarity in terms of personal musical influences (X-axis) and ecological characteristics (Y-axis). For purpose of clarification, we will refer to these 'other' composers-the dots in Fig. 6a-p-as 'object' composers in the sense that they are compared to one unique 'subject' composer. For example, in Fig. 6a Beethoven is the 'subject' of the analysis and Brahms, Dvořák, etc. are 'object' composers located (with dots) relative to Beethoven. Furthermore, 'object' composers are grouped into four categories according to an age relationship with the 'subject' composer: 1. Composers dead 0-25 years before the birth of the 'subject' composer, 2. Older contemporary composers, 3. Younger contemporary b Fig. 6 A few selected 'subject' composers. Notes (1) Each dot in these figures is a vector that represents an 'object' composer, located relative to the 'subject' composer of the figure, according to the values of two similarity indices based on: (1) personal musical influences (lineage) on the X-axis and (2) musical ecological niches on the Y-axis. The axes do not cross at the origin but at the critical values delimiting statistically-significant similarity index values (above) versus independence/dissimilarity (below).
(2) The number in front of a composer's name is a ranking which reflects the importance of this particular composer. This is the primary ranking established in 'The Classical Music Navigator' (Smith 2000), and also discussed in main text of this section  composers, and 4. Composers born 0-25 years after the death of the 'subject' composer. See Fig. 6a, d, g, h, respectively, for 'subject' composer Beethoven. Note that the two axes in all panels of Fig. 6 have been drawn at their critical significant values at 5%. Given Eq. 4, the Z-statistic is at its critical value when Z = ABS CSC Â ffiffi ffi n p ð Þ= 1.96. The value for n is 500 in the case of the influences network database, and 298 in the ecological characteristic database. Thus, the critical values are CSC c ¼ AE1:96= ffiffiffiffiffiffiffi ffi 500 p ¼ AE0:0877 and CSC c ¼ AE1:96= ffiffiffiffiffiffiffi ffi 298 p ¼ AE0:1135, respectively. The four quadrants delimited by the two positive critical values correspond to the four cells in Table 2. Thus, the word 'high' in Table 2 is now assumed to represent a statistically significant positive association between 'object' and 'subject' composers, and the word 'low', no statistically significant association. 21 In some panels of Fig. 6, we can also see vertical and horizontal spikes of dots at the origin (zero). These dots represent independence (along one of the two criteria). Observe therefore four cases: (1) 'Object' composers who score high on both indices are located in the North-East quadrant and are considered to be very similar to the 'subject' composer. (2) 'Object' composers who score low on both indices are located in the South-West quadrant. Their association to the 'subject' composer is statistically insignificant on both criteria and they are considered to be most dissimilar to the 'subject' composer. (3) 'Object' composers who score high on the personal influence index, but low on the ecological index, with respect to the 'subject' composer, are located on the South-East quadrant. Their ecological niches are different from the one of the 'subject' composer, even if they share a common lineage of personal musical influences. As we argued before, this may be a sign of music speciation and evolution. (4) 'Object' composers who score low on the personal influence index but high on the ecological index with respect to the 'subject' composer are located in the North-West quadrant. Despite no or little common personal lineage with the 'subject' composer, they have developed a somewhat similar sound by composing in musical niches that share many ecological characteristics. Using evolutionary biology terminology, this could be a sign of 'convergent evolution'.
Of course, a high positive value for a similarity index reveals a significant association between a pair of composers, but does not imply causality. Still, by grouping composers on the basis of an age relationship with the 'subject' composer we can somehow identify the antecedent or 'causality in similarity'. For example, if an 'object' composer was located in the South-East quadrant but died before the birth of the 'subject' composer, then music speciation/evolution should be attributed to the 'subject' composer. The latter distanced himself from the former by composing in a different musical/ecological niche. However, under the same South-East location, music evolution/speciation should be attributed to the 'object' composer if he was born after the death of the 'subject' composer. Extending this reasoning in the case of contemporary composers (both alive at one point in time) is of course ambiguous. A much younger contemporary composer is likely to be the one imitating or differentiating oneself from the older composer. But some degree of crossimitation must be expected from composers of similar ages. Figure 6a-p apply this graphical approach to a few composers such as Gluck, Beethoven, Wagner, Debussy, and Schoenberg, and I discuss their specifics later on in this section. As one cannot make general statements about Western classical music evolution, a gigantic undertaking of a major art form, based on an analysis of just five 'subject' composers, I first start by establishing some general observations. Tables 5a-c present  Smith (2000) secondary ranking of the Top 20 most 'influential' composers statistics covering individual, some subsets, and all of the 500 composers included in the database. Table 5a is essentially equivalent to Fig. 6. For example, Table 5a is divided into four panels corresponding to the four age relationships between 'subject' and 'object' composers. Table 5a also gives the density in each quadrant (North-East, South-East, North-West and South-West), that is, it computes with respect to a 'subject' composer, frequencies of occurrence of 'object' composers located in each quadrant. Table 5a reports results for five specific 'subject' composers (Monteverdi, Gluck, Beethoven, Debussy, and Schoenberg). 22 However, this computation was done for all 500 'subject' composers and Table 5b reports average results over all 500 'subject' composers. The first column in the first panel of Table 5b ('subject' composer vs. composers dead 0-25 years before the birth of the 'subject' composer) gives the mean (and standard deviation in brackets) of these frequencies computed over all 500 'subject' composers-4, 11, 25 and 60% for, respectively, the North-East, South-East, North-West, and South-West quadrants. The results illustrate that, on average, composers strongly differentiate from recently dead composers. Sixty percent of them compose in a different ecological niche (from the one associated to dead composers) and have no significant similarity on the basis of personal musical influences (South-West quadrant). Only 4% of them are statistically similar to those dead composers with respect to ecological niche and personal influences (North-East quadrant). 23 Finally, observe the much higher density in the North-East quadrants and lower density in the South-West quadrants in the first column of panels 2 and 3, where 'subject' composers are compared to either older or younger contemporaries, respectively. This suggests an overall larger tendency for crossimitation between pairs of contemporaries (higher similarity in personal musical influences and ecological niches). We pursue the analysis by considering subsets of 'subject' composers regrouped into rankings such as  most 'important' composers. 24 We also grouped them by periods such as all 48 Renaissance composers included in the Classical Music Navigator (CMN) database, all 50 Baroque composers, all 57 Classical, all 146 Romantic, and all 195 Modern composers. 25 Of the three rankings used here, the first one is 22 Using Beethoven as an example of how Table 5a is constructed on the basis of Fig. 6, observe that panels 3 and 4 in Table 5a show that 33% of Beethoven's younger contemporary composers are located in the North-East quadrant of Fig. 6h while only 6% of composers born 0 to 25 years after Beethoven's death are located in the North-East quadrant of Fig. 6a. In Fig. 6h, we see indeed that 18 'object' composers are in the North-East quadrant out of a total of 54 composers included in the graph. Setting y = 18 and n = 54, we get that p = 18/54 = 0.33 as reported in Table 5a. For Fig. 6a, y = 2, n = 33, and p = 0.06. We can test whether the difference in the two proportions p1 and p2 is statistically different, that is: H 0 : p1 = p2 versus H A : p1 = p2. We need to compute: where p Ã ¼ y1þy2 n1þn2 . In our example, Z Ã ¼ 2:94 [ 1:96. Hence we reject the null hypothesis that the two proportions p1 and p2, are the same, in favour of the alternative that their difference is statistically significant at a 5% level. Alternatively, in terms of P value, Pr(Z [ 2.94) = 0.0016 \ 0.05, and we again reject the null hypothesis. 23 This is also confirmed when looking at the first column in panel 4 of Table 5b (representing 'subject' composers vs. composers born 0-25 years after the death of the 'subject' composers). Only 4% of the composers are strongly similar to the (dead) 'subject' composers, while 61% are statistically independent across both personal and ecological categories. 24 Composers who belong to the 'canon' of classical music are not necessarily ranked, but scientists can count the number of lines or pages devoted to them in major music encyclopedia, the number of recordings available, etc., and then turn scores into a ranking. Although the rankings per se (and underlying aggregation methodology) are controversial and often discredited by musicologists, the collection of names in these lists, instead of the ranking, may provide useful information. 25 I left out the group of pre-Renaissance composers. Also, whether a composer falls into a specific period is based on the categories given in the CMN. (2017) 112:21-53 43 the 'primary' ranking of the Top-100 composers computed by Smith (2000) in the CMN, from which Top-20 and Top-50 rankings are also derived (and referenced in Table 5b as  TOP 20S, TOP 50S and TOP 100S). 26 The second one is Smith's 'secondary' ranking of most 'influential' composers, based on the list and the (primary) ranking of those composers who were influenced by the composer under study (TOP 20iS in Table 5b). 27 The third one (TOP 20M in Table 5b) is the Top-20 ranking proposed by Murray (2003). 28 In the following, I only discuss results for Top-20 composers according to the primary ranking of Smith (TOP 20S), because other rankings give roughly similar results. Hence results are robust and do not depend on the method underlying the construction of these rankings. Compare first and second columns in panel 2 of Table 5b (Columns ALL and TOP 20S) and think of the mean across all 500 'subject' composers (first column) as the result pertaining to an 'average' subject composer. We therefore see that Top-20 'subject' composers have (on average) denser North-East and South-East quadrants than the average 'subject' composer (0.46 [ 0. 28 and 0.31 [ 0.19). This suggests that the creative process of Top-20 composers (even more so than for an average composer), is not due to genius alone but is based on personal musical influences, in particular a strong similar lineage (or network of personal influences) with older contemporaries. This reminds the much-quoted expression attributed to Isaac Newton: ''if I have seen further, it is by standing on the shoulders of giants.'' Concentrating more specifically on the South-East quadrant, we observe that it is denser for Top-20 'subject' composers than for the average 'subject' composer (31 vs. 19%). According to our typology in Table 2, this suggests that major composers, while also sharing personal musical influences with older contemporaries, contributed more than the 'average' composer to music evolution by composing in a different (i.e., new) musical ecological niche, which, in turn, made them sound 'different' from the average composer. On the other hand, the North-West quadrant for the average 'subject' composer is denser than the one corresponding to Top-20 'subject' composers (25 vs. 8%). This means, first, that the 'average' composer has a distinct personal lineage (from the one of older contemporary composers), suggesting that the 'average' composer is somewhat isolated, or perhaps less well-connected (than Top-20 composers) to the network of key influences. Secondly, this means that the 'average' composer is more likely to share the musical ecological niche of older contemporaries, eventually producing music that sounds somewhat similar (convergent evolution), and as such contributing less to the evolution of Western classical music. Although panels 1 and 3 of Table 5b can generally be interpreted along similar lines as panel 2, panel 4 brings an interesting twist. Imitation or differentiation, in panel 4, must be attributed to the 'object' composer as the 'subject' composer is dead. Panel 4 therefore means that 'object' composers are more likely to differentiate themselves (or at least be independent) from Top-20 composers than from an 'average' composer (64 vs. 61%). 29 This perhaps reflects the idea that new generations try to differentiate themselves in particular from top (dead) composers, for fear of being categorised as 'epigones' by music historians and eventually forgotten by the public. 30 One problem with our focus on Top-20 'subject' composers is that they are not necessarily 'innovators' or 'transitional' composers, (i.e., composers located on the diagonal in Fig. 1 as identified by musicologists). For example, few musicologists would consider J.S. Bach or W. A. Mozart, two major composers, to be genuine innovators. An alternative strategy is therefore to compare innovators and/or transitional figures with composers of the music period from which they progressively diverged, for example, by comparing Monteverdi to all Renaissance composers, Gluck versus all Baroque composers, Beethoven versus all Classical composers, Debussy and Schoenberg versus all Romantic composers. We therefore propose to compare statistical results for specific 'innovators' in Table 5a with results for the 'average' subject composer of a specific period in Table 5c. Focusing on panel 2 in both tables, we see that the South-East quadrant for specific 'innovators' in Table 5a is denser than the quadrant corresponding to the 'average' composer of the period from which they progressively diverged. In the case of Beethoven, 42% of his older contemporaries fall in the South-East quadrant while the corresponding number is just 16% for the 'average' classical composer. This not only means that Beethoven was better connected (than the 'average' classical composer) to older contemporaries in terms of personal musical influences (i.e., 'standing on the shoulders of giants'), but also that he was progressively composing in a different musical ecological niche (than the one of the 'average' classical composer), leading to a change of sound in classical music and opening the way to the Romantic period. This is also true for Monteverdi (11%) versus the 'average' Renaissance composer (2%) or Gluck (26%) versus the 'average' Baroque composer (6%). This, however, is just marginally true for Debussy (vs. the 'average' Romantic composer), and not true for Schoenberg (20 vs. 25% for the 'average' Romantic composer). 31 One difficulty is, of course, the concept of an 'average' Romantic composer who would be representative of a rather long period divided itself in very distinct sub-periodsearly, middle and late Romantic periods-each having their own 'innovators' or transitional composers. Besides, it is also informative to recall that Schoenberg felt that his early 29 On a cautious note, the difference between the two proportions is not statistically significant according to the methodology presented in Footnote 22. Hence we should avoid extracting too much musicological information from this fact. 30 For example, much of the traditional symphonic writing fell out of fashion after Beethoven's Ninth Symphony (1824). That Joachim Raff (1822-1882), a prolific and very well-known traditional symphonist of his time (but born just 5 years before the death of Beethoven), tends to get little attention in music history books, shows the ease with which the historian's attention is captured by novelty. 31 We can test the difference in proportions using the methodology in Footnote 22. The differences are statistically significant at 5% for Monteverdi, Gluck, and Beethoven versus their corresponding 'average' composers of the period from which they progressively diverged. For Debussy and Schoenberg versus the 'average' Romantic composer, however, we cannot reject the null hypothesis of no difference in proportions. music would prove his understanding of and respect for tradition. 32 This perhaps explains our results in panel 2 of Table 5a (or in Fig. 6m) that characterize Schoenberg as building on the romantic tradition (very dense North-East quadrant -54%) instead of being exclusively characterised as an innovator.

Scientometrics
After these general observations, I now pursue with a few specific results related to Fig. 6a-p for 'subject' composers Gluck, Beethoven, Wagner, Debussy, and Schoenberg (of which Gluck, Beethoven, Debussy and Schoenberg are viewed by musicologists as innovators and 'transitional' composers, and therefore positioned on the diagonal in Fig. 1). The objective is to demonstrate that our results, based on a statistical methodology, confirm many facts well-known to musicologists.
First, observe again that after the death of the 'subject' composer, there is a strong tendency for newer generations of composers to seek different personal lineages and/or musical ecological niches (i.e., the North-East quadrants of Fig. 6a-c, have a very low density of dots relative to other quadrants, in particular the South-West quadrant). Figure 6c shows that twentieth century composers Xenakis, Berio, Reich and Glass who were born 0-25 years after the death of Debussy, are quite different from him on both criteria. See also Fig. 6a for Beethoven and Fig. 6b for Wagner. Although this confirms the general result observed previously, it is worth emphasizing that this is a differentiation away from 'subject' composers (such as Beethoven, Wagner, or Debussy) who are known to have had direct influences on younger contemporary composers. Hence, a strong process of music evolution and differentiation operates over time, across new generations. Of course, there are exceptions. A composer such as Brahms, born after the death of Beethoven, appears in the North-East quadrant of Fig. 6a, suggesting strong similarities with Beethoven. And, as is well known, Brahms' First Symphony (from 1876) has often been compared to the Ninth Symphony of Beethoven (1824). 33 Music evolution and differentiation can also be viewed from another side, when observing graphs of 'object' composers who were dead before the birth of the 'subject' composer. We observe in Fig. 6d-f a low density of 'object' composers in the North-East quadrant, which suggests that the 'subject' composer (respectively, Beethoven, Wagner, and Debussy) distanced himself from past generations of composers in terms of musical ecological niche and/or personal lineage.
Second, results are quite different from those reported above when considering 'contemporary' composers ( Fig. 6g-p). In this case, we typically observe a large density of dots in the North-East quadrants, suggesting a process of imitation. For example, we see the common personal lineage and ecological niches of Beethoven with older contemporaries such as J. Haydn and W. A. Mozart (Fig. 6g). Then, it is the turn of younger contemporaries such as Hummel, Schubert, Mendelssohn, to also 'imitate' Beethoven to some extent (Fig. 6h). We see the extent to which Wagner's music is both a product of his time and a music that has been imitated, with a large density in the North-East quadrant for older composers (Berlioz, Meyerbeer, Glinka, Nicolai in Fig. 6i) and younger contemporaries (Gounod, Borodin, Bizet, Massenet in Fig. 6j). We see a strong similarity of Debussy with some of his older contemporaries in Fig. 6k (Franck, Fauré, Chabrier, Chausson), and we 32 As evidenced by a letter from 1923 to conductor Werner Reinhart, reproduced and translated in Stein (1975) in his selected writings of Schoenberg, the composer wrote: ''I do not attach so much importance to being a musical bogeyman as to being a natural continuer of properly-understood good old tradition!''. 33 As T&G (2013) recount, the pianist and conductor Hans von Bülow hailed it as the 'Tenth Symphony' and then proclaimed a new holy trinity of classical music-'Bach, Beethoven and Brahms'-that has lived on, ever since, in the catchphrase 'the three B's'. see Ravel and Roussel subsequently embracing Debussy's impressionism (Fig. 6l). We see the middle and late Romantic heritage of Schoenberg (e.g., Brahms, R. Strauss, Mahler, Reger) in Fig. 6m and then we see Berg and Webern developing the innovative dodecaphonic (or twelve-tone) method of composition of Schoenberg (Fig. 6n). We finally see in the North-East quadrant of Fig. 6p that Gluck and Jommelli (both born in 1714) are very similar. Gluck's reforms of the opera will be discussed shortly. However, note that Jommelli is also known for his reforms of the Italian opera, so much so that he has been called the 'Italian Gluck' (Grout and Williams 2002).
Third, relationships among contemporaries are not just limited to a process of imitation; we also see a process of differentiation and evolution among them as the South-West and South-East quadrants are also densely populated in Fig. 6g-p). According to our typology, Gluck's music is different from earlier Baroque contemporaries such as A. Scarlatti and, later on, Handel (South-West quadrant of Fig. 6o). Indeed, Gluck's reforms of the opera of the mid-eighteenth century was a reaction to the excesses of 'pre-reforms' Baroque opera seria (and the virtuosic display of da capo aria) of composers such as A. Scarlatti and followers. 34 He abolished vocal virtuosic excess for its own sake so that the music would serve the needs of the drama, that linguistic elements took place over purely musical considerations, that realism was privileged over fantasy or irrationality. Gluck's operas, despite all his reforms, also follow the conventions of the older French Tragédie Lyrique, including the use of librettos in French language, which tends to explain the common lineage with Rameau and other French baroque composers located in the South-East part of Fig. 6o, despite the obvious evolution from their music. 35 Continuing with other composers who changed the sound of music, we see that Liszt, a younger contemporary of Beethoven, has developed a music different from the one of Beethoven despite having a similar lineage (Fig. 6h, South-East quadrant). 36 Much of the symphonic writing ('traditional', 'non-programmatic', 'multi-movement' symphony) fell out of fashion after Beethoven's Ninth symphony (1824). From that point onwards, the last symphony of Schubert, and those of Mendelssohn and Schumann, however magnificent they are, could only be regarded as the works of epigones. And symphonies composed yet later on, in the 1850s and 1860s, by conservative composers such as Anton Rubinstein, Carl Reinecke, Max Bruch, or Joachim Raff have not successfully survived the repertoire. 34 As T&G (2013) explain, A. Scarlatti, a culminating figure of Baroque opera at the turn of the 18 th century, laid the foundation of opera seria (serious opera) and the da capo aria which includes a last section that is essentially unwritten but becomes an opportunity for the singer to do free-form spontaneous embellishment and improvisation, ensuring a virtuoso display and the kind of spectacular performance on which public opera has always thrived. With time, most great singers (of which the well-known Farinelli) carried around a portfolio aria that could be inserted whenever they sang, even if irrelevant to the context. Although the operas of Handel are cast in the same mold as other opera seria, he typically gave performers less room to manoeuvre. This led to a decline in interests from performers and the public and forced Handel out of opera and into English oratorios. 35 Incidentally, the relative position of Rameau and Pergolesi in this graph reminds the so-called 'Querelle des bouffons' (the War of the Buffoons), which in the 1750s (a long generation before the French revolution) foreshadowed not only musical change but also political and social change. As T&G (2013) explain, Jean-Jacques Rousseau and Diderot ridiculed the high-minded French Tragédie Lyrique and Pastorale Héroique in the style of Lully, Rameau, Leclair (Scylla et Glaucus), and Mondonville (Titon et l'Aurore), which was performed by the royal musical establishment. Furthermore, Rousseau argued that French language was not suitable for operas. Instead, Rousseau was glorifying the 'modern' style of Italian operas buffa and intermezzos, including the most popular at the time-La serva padrona-by Pergolesi, brought to Paris in 1752. 36 Liszt's teacher, Carl Czerny, was himself the pupil of Beethoven (among others). Czerny's compositional style and teaching often mimicked Beethoven himself, and much of Franz Liszt's early learning can be said to have come from Beethoven himself (Mao 2012). which any other is composing. We see perhaps 'convergent evolution' in Fig. 6k (North-West quadrant) for U.S.-born Samuel Barber, a (much) younger contemporary of Debussy who, despite some American feel of his music, was rather isolated over there, and composed in an ecological niche (concertos, symphonies, opera) that was much closer to the late-Romantic European composers than the ecological niche (including jazzy elements and film music) of U.S. composers of his time such as A. Copland or L. Bernstein. We also see Gershwin whose composition, An American in Paris, reflects the journey that he had consciously taken as a composer. As cited in Hyland (2003), Gershwin declared with respect to this composition: ''The opening part will be developed in typical French style, in the manner of Debussy and Les Six, though the tunes are original''. 40 And despite all the jazzy elements of his music, his piano Concerto in F was criticised for being too much related to the work of Debussy. Despite Gluck's opera reforms mentioned earlier, and his separate network of influences (including his partisans opposed to the famous poet and librettist Metastasio and his circle of opera seria composers using dazzling artifices), he was part of the transition between Baroque and Classical Periods, sharing the ecological niche of many composers (e.g., Fasch, Hasse and Pergolesi in Fig. 6o and Piccinni in Fig. 6p) who were also contributing to mid-eighteenth century stylistic changes, suggesting a convergent evolution. 41

Conclusion and future research
This paper uses two databases, the personal influences and the ecological categories databases extracted from the CMN, to test, statistically, for similarity between pairs of composers, using the centralised cosine similarity index. Each of these two databases permits to capture one aspect of similarity across pairs of composers. As such, this is a contribution to the music information retrieval research. However, this paper goes one step further by using the two similarity rankings conjointly in order to generate a typology of cases that permits to explore music imitation and differentiation, music 'speciation' and 'convergent evolution'. That results in the fourth section corroborate many facts well known to musicologists is indicative of a sound database and methodology. This said, although there is scope for a true evolutionary model of Western classical music, including the construction of a phylogenetic tree, there are also challenges. In biological systematics, one is typically given some group of species (from within a large genus), and data on some number of their adaptive traits (plus external knowledge on which traits are viewed as more primitive to the others). Then, various algorithms have been developed to produce a family tree having the most likely chance of accurately reflecting speciation patterns over time. But in that instance, there is the useful simplification that each species comes from only one other, whereas in the Western classical music context, the 'events' (particular composers) are the product of multi-influence. 40 Les Six is a name given in 1920 to a group of six French composers who worked in Paris, of which Milhaud, Poulenc and Honegger are the most well-known. 41 The famous rivalry between 'Piccinnists' and 'Gluckists', a 'querelle' in Paris opposing dramatic versus musical values of two operas composed on the same subject, Iphigénie en Tauride, seems to suggest a stronger differentiation than the one implied by the North-West location of Piccinni versus Gluck. However, for T&G (2013) ''[b]oth were equally, though differently, a sign of the intellectual, philosophical, and social changes that were taking place over the course of the eighteen century. (…) The two composers, privately on friendly terms, were more allies than rivals,'' suggesting indeed a convergent evolution despite their respective advocates and their conflicting network of influences that led them to a collision course.
Hence, at this stage, it is best to see our work as preliminary background. First, it will take some time to sort through the numerous results obtained with the methodology introduced in this paper. Second, there is a need to improve this framework using a finer analysis, one that would introduce specific sub-periods (early, middle, and late Romantic periods, subdivisions of the twentieth century, etc.), and that would consider additional age categories among contemporaries (not just older vs. younger contemporary composers). Third, current results and their limitation are also driven by the information available in the CMN. One limitation is that the CMN data suffer from some spottiness, as many of the less significant composers on the list of 500 remain incompletely studied or commented upon. Musicological research on composers is an ongoing effort and newly discovered influences from (and on) lesser composers must progressively be included in the CMN. A large-scale literature review should reduce this problem and would permit to improve our narrative of Western classical music evolution based on statistical analysis and methods developed in biosystematics, scientometrics and bibliometrics. very useful comments; I also thank Frédéric Sigouin for his excellent assistance with the database. The paper benefited from initial discussions with Charles H. Smith and comments on an earlier draft. Most of the data used in this project was provided by Charles H. Smith in the preparation of two earlier co-authored articles.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Appendix: BID versus CSC rankings: the BID index as a quadratic function of the CSC index The binomial index of dispersion (or similarity) used in Smith andGeorges (2014, 2015) can be computed for any pair (i, j) as: where a, b, c, d, and n are the count/number of composers in each of the five sets CI i;j , I i;Àj , I j;Ài , C À CI i;j À DI i;j and C (see Table 3). Table 3 permits computation of frequency of joint presence, frequency of joint absences, and frequency of mismatches. When two composers are independent (lack of association), the proportion or frequency of joint influences (a/n) is equivalent to the product of the proportions (a ? b)/n and (a ? c)/n (that is, the proportion of composers in the database that have influenced i and the proportion of composers that have influenced j). If the observed frequency is greater than the one expected under independence, then the two composers are said to be positively associated. Under the condition that all expected frequencies in the presence/absence table (which is computed assuming independence of composers) are at least five and the sample size is sufficiently large, BID is asymptotically v 2 distributed with one degree of freedom. The v 2 test of independence can then be used to assess whether there is a statistically significant association between two composers. A concrete example is given in Table 4 for composer Debussy. One intriguing point in Table 4 is that the rankings produced by the binomial index and the centralised cosine measure are exactly the same for a large portion of the table but then start to dissociate with composer Carter (identified at the 480th position in the first column of Table 4 and at the 241rd position in the fifth column). This result has, however, a simple explanation-There is a quadratic relationship between BID and CSC, as proved mathematically in , and CSC can take negative values. Observing Eqs. (3) and (5), it is clear that the relationship between CSC and BID is quadratic: Given Eq. (6) and because CSC can take negative values, BID is not a monotonic function of CSC. Hence, the order (or ranking) between CSC and BID is not preserved over the full set of values for CSC. 42 It is clear for example that a value for CSC = ?x for a pair of composer and CSC = -x for another pair will generate a unique value BID = nx 2 for both pairs. Hence, the ranking of both pairs of composers will be the same using the BID index but quite different with the CSC index. This is shown in Fig. 7 using some data given in Table 4. The CSC index for Carter and Debussy is -0.038 while the CSC for Hindemith and Debussy is ?0.039. The CSC values for the two pairs of composers are clearly different (one is positive and the other negative, while their absolute value is roughly the same), and the rankings for Carter and Hindemith with respect to Debussy on the basis of CSC will therefore be quite distinct, even if these CSC values generate the (roughly) same positive value for BID (?0.7) according to Eq. (6), implying a (roughly) similar ranking under the BID. 43 Finally, that the rankings produced by BID and CSC are exactly the same for a large portion of Table 4 is due to CSC values not symmetrically distributed between  Table 4