Bayesian phylogenetic analysis of Philippine languages supports a rapid migration of Malayo-Polynesian languages

King, Benedict; Greenhill, Simon J.; Reid, Lawrence A.; Ross, Malcolm; Walworth, Mary; Gray, Russell D.

doi:10.1038/s41598-024-65810-x

Bayesian phylogenetic analysis of Philippine languages supports a rapid migration of Malayo-Polynesian languages

Article
Open access
Published: 28 June 2024

Volume 14, article number 14967, (2024)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Bayesian phylogenetic analysis of Philippine languages supports a rapid migration of Malayo-Polynesian languages

Download PDF

Benedict King¹,
Simon J. Greenhill²,
Lawrence A. Reid^3,4,
Malcolm Ross⁵,
Mary Walworth¹ &
…
Russell D. Gray^1,6

Abstract

The Philippines are central to understanding the expansion of the Austronesian language family from its homeland in Taiwan. It remains unknown to what extent the distribution of Malayo-Polynesian languages has been shaped by back migrations and language leveling events following the initial Out-of-Taiwan expansion. Other aspects of language history, including the effect of language switching from non-Austronesian languages, also remain poorly understood. Here we apply Bayesian phylogenetic methods to a core-vocabulary dataset of Philippine languages. Our analysis strongly supports a sister group relationship between the Sangiric and Minahasan groups of northern Sulawesi on one hand, and the rest of the Philippine languages on the other, which is incompatible with a simple North-to-South dispersal from Taiwan. We find a pervasive geographical signal in our results, suggesting a dominant role for cultural diffusion in the evolution of Philippine languages. However, we do find some support for a later migration of Gorontalo-Mongondow languages to northern Sulawesi from the Philippines. Subsequent diffusion processes between languages in Sulawesi appear to have led to conflicting data and a highly unstable phylogenetic position for Gorontalo-Mongondow. In the Philippines, language switching to Austronesian in ‘Negrito’ groups appears to have occurred at different time-points throughout the Philippines, and based on our analysis, there is no discernible effect of language switching on the basic vocabulary.

Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic

Article 24 April 2019

Dated phylogeny suggests early Neolithic origin of Sino-Tibetan languages

Article Open access 27 November 2020

The origin and expansion of Pama–Nyungan languages across Australia

Article 12 March 2018

Introduction

Approximately 4500 years ago, Austronesian languages spread from their homeland in Taiwan to Island South-East Asia, and then expanded further into New Guinea, Oceania, parts of mainland South-East Asia and Madagascar^1,2. These Austronesian languages outside of Taiwan all belong to the Malayo-Polynesian (MP) subgroup^3,4. Based on archaeological data, it is widely accepted that the initial MP expansion was extremely rapid^5,6. This rapid expansion would have left little time for linguistic innovations to arise and, therefore, few features with which to diagnose language subgroups. A rapid expansion is thus one possible explanation for the difficulty in finding relationships for the multiplicity of subgroups high in the MP tree^7,8. Several scenarios have been proposed regarding the spread of Malayo-Polynesian languages out of Taiwan. There is disagreement over the extent to which languages have remained in place following this original migration or whether present-day language distributions result from a series of secondary migrations or population expansions⁹. It has also been proposed that the MP expansion actually consisted of many different migrations in different directions¹⁰. The archaeological data potentially supports the latter hypothesis, with approximately simultaneous age estimates for archaeological sites in the Philippines and Borneo. However, the archaeological record is too poorly sampled, and the date estimates too imprecise, to clearly distinguish between the alternative hypotheses¹¹.

Austronesian speakers were not the first people in Island South-East Asia (ISEA). Hunter-gatherer populations are thought to have arrived in the Philippines at least 47,000 years ago¹², with possible evidence of human occupation 67,000 years ago^13,14. Relict populations of these original inhabitants remain in the Philippines, including the Aeta of Luzon, Batak of Palawan, Ati of Visaya and the Mamanwa of Mindanao. These groups are often collectively referred to as ‘Negritos’ or ‘Black Filipinos’¹⁵, although they are only very distantly related to each other. All ‘Negritos’ speak Austronesian languages to which they presumably shifted from distantly related and now-extinct indigenous languages. It is debated to what extent these ‘Negrito’ languages retain a substrate from their original languages. Reid¹⁶ suggested that unique lexical items in the Austronesian languages of ‘Negritos’ are potential retentions from their original, replaced languages. Languages of ‘Negritos’ are clearly Austronesian, but often appear only distantly related to languages spoken by their non- ‘Negrito’ neighbors^17,18, suggesting that they were acquired early in the MP expansion¹⁸.

Whether or not there were populations other than ‘Negritos’ in ISEA prior to the MP expansion is more uncertain and, therefore, so are the relative roles of demic and cultural diffusion in the MP expansion. There is evidence that pre-Austronesian populations in Mainland South-East Asia switched to Austronesian from Austro-Asiatic languages^19,20, although a small amount of demic diffusion was detectable in Cham populations¹⁹. In ISEA, Hill, et al.²¹ found that only a small proportion of mitochondrial DNA in ISEA could be linked directly with an Out-of-Taiwan dispersal. Austronesian-speaking populations in ISEA show a mixture of Austronesian and Austro-Asiatic genetic material²², although the relative timing of admixture events is often hard to determine. However, Admixture History Graphs based on the relative co-variance of ancestry components²³ suggest that Papuan-Austroasiatic admixture occurred before the Austronesians arrived in Wallacea²⁴. Further, genetic data from rice crops also challenges the idea that Austronesian expansion was a demographic dispersal associated with agriculture because rice varieties from the Philippines show more affinity with those from Mainland South-East Asia than those from Taiwan²⁵. Larena, et al.²⁶ suggest that the ancestors of today’s Austronesian-speaking ethnic groups in the Philippines arrived in several waves between 15,000 and 8000 years ago, again before the accepted ~ 4500 BP date for the Out-of-Taiwan dispersal.

One hypothesis to explain these discrepancies is that there were Austro-Asiatic speakers in ISEA prior to the MP expansion, and these populations switched to speaking Austronesian languages²⁷. It has been argued that the combination of lexical conservatism and grammatical disparity in MP languages is symptomatic of this type of widespread language shift²⁸. It has also been suggested that the Land Dayak MP languages of Borneo show evidence of an Austroasiatic substrate²⁹, but this evidence is disputed³⁰. Archaeological data is sparse, and much of the evidence for Austroasiatic peoples in Indonesia prior to Austronesian expansion centers around the archaeological site at Gua Sireh. Pottery from this site has carved, cord-wrapped or basket-wrapped paddle-impressed surfaces, more similar to mainland archaeological assemblages than the red-slipped pottery associated with Austronesians³¹. Although the Gua Sireh site has also been suggested to show evidence for early rice agriculture, a more detailed analysis of the pottery sherds revealed no evidence of domestic rice³².

The Philippines, lying immediately south of Taiwan and forming the probable first stepping stone in the expansion³³, are central to debates regarding the dispersal of MP languages. There have long been arguments in favor of a discrete Philippine subgroup of MP, based primarily on lists of putative lexical innovations^34,35,36. This hypothesis is associated with an agricultural-demographic expansion, whereby a single language, Proto-Philippines, emerged sometime after the initial colonization of the Philippines and replaced the remaining languages in the Philippines as its speakers sought new agricultural land, an event often known as a “language leveling episode”^35,37. Such a demographic expansion would explain the relatively low language disparity within the Philippines³⁸ and reconciles the proto-Philippines hypothesis with the expected rake-like language phylogeny that would be expected to emerge from a rapid Out-of-Taiwan expansion⁷.

However, there is an ongoing debate about the validity of the Philippine subgroup. The quality of most of the proposed lexical innovations has been challenged because they are not distributed across all Philippine microgroups^7,39,40. Smith⁷, in reviewing the evidence for Proto-Philippines, concluded that only *bulbul (feather) and *dakə́l (big) might be true replacement innovations. Ross³⁹ questioned why a proto-language with so many putative lexical innovations did not also show evidence of phonological or grammatical innovations. Ross³⁹ tentatively suggested that the languages of the Batanic islands, an island chain between Taiwan and Luzon, might form the first branch of MP and descend from the first Austronesian people to leave Taiwan, but has since abandoned this idea⁴⁰. Although a phonological merger of Proto-Austronesian *z and *d has been argued to define the Philippine languages^36,41, it is attested in several Formosan subgroups⁴² and absent in the Central Luzon subgroup⁴³, suggesting it is of little diagnostic value.

The relationships among the languages in the Philippines remain an open question. Blust⁴⁴ identifies 15 Philippine micro-groups: Bashiic/Batanic, Cordilleran/Northern Luzon, Central Luzon, Inati, Kalamian, Bilic, South Mangyan, Palawanic, Central Philippines (including Tagalog, Bisayan, Bikol, Mansakan), Manobo, Danau, Subanon, Minahasan, Sangiric and Gorontalo-Mongondow. The latter three groups are languages of northern Sulawesi, which are included within the putative Philippine language subgroup. Seven Philippine micro-groups were hypothesized by Blust to form a single higher-level subgroup of Greater Central Philippines (GCP): Central Philippines, South Mangyan, Palawanic, Manobo, Danau, Subanon and Gorontalo-Mongondow. The Greater Central Philippine subgroup is suggested to be the result of a second language levelling episode, similar to the first Philippines-wide event³⁷. GCP is supported by lexical innovations and the merger of Proto-Philippines *g and *R⁴⁴. The inclusion of Gorontalo-Mongondow within the Greater Central Philippine subgroup is significant as this micro-group is located in northern Sulawesi, south of the Sangiric and Minahasan languages, suggesting the GCP expansion bypassed these micro-groups.

Phylogenetic methods are increasingly used to uncover and evaluate language relationships. Gray, et al.⁴⁵ used Bayesian methods on a dataset of Austronesian languages. Regarding the Philippines, they found no support for Greater Central Philippines, with the Gorontalo-Mongondow languages instead forming the sister group to the other Philippine languages However, the sampling of languages in this analysis was non-representative, and the cognate-coding for these Philippines languages was incomplete. Since the publication of Gray, et al.⁴⁵, the Austronesian Basic Vocabulary Database⁴⁶ on which it is based has seen a significant increase in the number of languages included and an overhaul of the cognate coding in Formosan and Philippine languages.

Here we apply Bayesian phylogenetics to Philippine languages data from a revised and updated Austronesian Basic Vocabulary Database. This data includes, for the first time, representatives of the Minahasan, Sangiric, Danao, Inati, Northern Mangyan and Manide-Alabat micro-groups. We aim, in particular, to establish the relationships of northern Sulawesi languages to other Philippine languages, determine the effect of language shift by ‘Negritos’ on lexical evolution and test the hypothesis of language leveling events within the Philippines. Our results are discussed with respect to hypotheses regarding the Out-of-Taiwan dispersal of MP languages.

Results

Summary of phylogenetic relationships

We performed a phylogenetic analysis in BEAST2⁴⁷ of a basic vocabulary dataset of 7565 cognate sets in 185 meanings for 202 Formosan and Philippine doculects. The locations of the sampled doculects are shown in Fig. 1. The summary tree (50% majority-rule consensus tree) of the main analysis is shown in Fig. 2.

We find strong support for a group composed of the Sangiric and Minahasan languages (posterior probability (pp) = 1.0), which is in turn, the sister group to the other Philippine languages (pp = 0.91). We reconstructed lexical innovations on the branch defining the Philippine languages to the exclusion of Sangiric-Minahasan. This revealed eight probable innovations (with posterior probabilities > 0.5; Table 1). Two of these (branch_2, man_male_2) can be discounted as they are found outside the Philippines in other Malayo-Polynesian languages not included in our sample. The other six however appear to be legitimate and geographically widespread innovations.

Table 1 Cognate sets estimated to be innovations in the languages of the Philippines proper and Gorontalo-Mongondow, which together initially diverged from the Sangiric and Minahasan languages of northern Sulawesi. The number in the cognate set name indicates the cognate code from the Austronesian Basic Vocabulary Database.

Full size table

We find strong support for a number of higher-level subgroups within Philippines. Northern Luzon and Batanic languages are grouped together (pp = 1.0), as are the Northern and Southern Mangyan subgroups (pp = 1.0), Tagalog with a handful of ‘Negrito’ languages (Manide-Alabat and Sinauna) (pp = 0.97) and Palawanic with Kalamian (pp = 1.0). There is also evidence for a large Central/Southern Philippine group (pp = 0.97) consisting of the Inati, Bisayan, Bikol, Mansakan, Manobo, Danau, Bilic, and Subanon micro-groups. Kagayanen falls separately to the other Manobo languages in the consensus tree, and only groups with the others with a posterior probability of 0.19. The ‘Negrito’ Mamanwa language does not group with the other Mansakan languages (pp = 0.0). The Bisayan group is not supported, and is found to be paraphyletic with respect to Inati, Mamanwa, and Bikol.

The relationships between six strongly supported higher level subgroups, and the Central Luzon and Gorontalo-Mongondow micro-groups, are highly uncertain. Alternative topologies for the interrelationships of these eight groups are shown in Fig. 3 in order of their posterior probability. Highlighting the uncertainty in these relationships, the single topology with the highest support has a posterior probability of just 0.15, and the top 20 topologies account for just 73% of the posterior sample. Much of this uncertainty is due to the positions of Gorontalo-Mongondow and Central Luzon. Gorontalo-Mongondow is found either as the second branch of the tree, as the sister group to the rest of the Philippine languages after the divergence of the Sangiric-Minahasan group, or nested somewhere with the Central Philippine, Palawanic, Mangyan or Tagalog languages. The Central Luzon languages are either found in a group with the Northern-Luzon and Batanic languages or with Tagalog.

Greater Central Philippines and Gorontalo-Mongondow

We found no evidence for the Greater Central Philippine (GCP) subgroup as traditionally defined, which has a posterior probability of 0 in our analysis. However, if the GCP hypothesis is modified to also include the Bilic languages, we do find some support (pp = 0.43). Whether or not this modified GCP is supported is contingent upon the position of Gorontalo-Mongondow. It is found outside the rest of the Philippines as an early diverging branch with posterior probability 0.43, or groups somewhere with Central Philippine languages with posterior probability 0.56. Henceforth we will use “macro-GCP” to refer to the modified GCP hypothesis (i.e. including Bilic).

We performed a topology test to reveal which lexical cognates provide evidence for the two conflicting positions for Gorontalo-Mongondow (Fig. 4), i.e. that Gorontalo-Mongondow forms an early branch of Philippine languages (diverging from other Philippine languages shortly after the divergence of Sangiric and Minahasan) or nests with Central Philippine languages. When we consider cognates with non-overlapping 50% Highest Posterior Density (HPD) intervals are considered, there are 6 cognates supporting each of the two topologies.

Cognate sets supporting an early branching position for Gorontalo-Mongondow are either widespread proto-MP forms, or forms shared exclusively with Sangiric and Minahasan languages. Left_1 (PMP *ka-wiʀi), todream_1 (PMP *hipi) and who_1 (PMP *i-sai) are found in Malayo-Polynesian languages not included in our analysis, and in all three cases are also not completely absent from the non-Sulawesi languages in our sample. We therefore do not consider these to be evidence for a core-Philippine subgroup. And_27 (e.g. Bantik bo), star_75 (e.g. Tonsea toti') and egg_17 (e.g. Bantik natúʔ) are exclusively shared between members of the Gorontalo-Mongondow, Minahasan and Sangiric languages within our dataset.

Two of the cognate sets supporting a Central Philippine position for the Gorontalo-Mongondow languages appear to be genuine innovations: blood_16 (proto-GCP *dugúq⁴⁴) and yellow_38 (Proto-GCP *darág) are shared exclusively by Gorontalo-Mongondow and Central Philippine languages. Three other cognates supporting a Central Philippine position for Gorontalo-Mongondow are discounted because they are PMP forms found broadly in MP languages outside our sample: wing_2 (PMP *panid), dog_1 (PMP *asu) and no/not_2 (PMP (Zorc) *diaq). Branch_22 (e.g. Proto-Minahasan *paŋa) was possibly lost in macro-GCP.

Language contact

We identified 10 independent transitions of Austronesian languages to ‘Negrito’ people by performing an ancestral state reconstruction (Fig. 5A, Table 2). These transitions are dated to various times within the last 2000 years: the oldest transition (to Sinauna, Inagata Alabat and Manide Labo) is dated to a minimum of 1929 BP, and the youngest (Atta Pamplona) to a maximum of 437 BP (Table 2).

Table 2 Austronesian language switching to ‘Negrito’ populations. Identified by maximum parsimony ancestral state reconstruction on the MCC tree (Fig. 5A). Minimum and maximum ages (BP) represent the nodes bookending the branches on the MCC tree on which the transitions are inferred to have occurred.

Full size table

The Philippine languages spoken by ‘Negritos’ show no discernable impact of contact on the core vocabulary. For each tree in the posterior sample, we recorded the rates of lexical evolution on the branch in the phylogeny in which a transition from a non-Austronesian to an Austronesian language in a ‘Negrito’ population is inferred. These branches did not show elevated rates of lexical evolution when compared to the rest of the tree (Fig. 5).

Discussion

We found no evidence of a North-to-South migration signal in the tree topology. In agreement with Ross⁴⁰, the hypothesis that the languages of the Batanes islands represent a first-order branch of MP³⁹ should be abandoned. Instead, our analysis shows a general pattern of South-to-North dispersal. This pattern rests principally on the deep branching position of the languages of northern Sulawesi and the nesting of Batanic within Luzon, both of which are strongly supported.

Our results show a dominant geographical signal. In our analysis, several groups considered to be only distantly related to other Philippine languages are strongly supported as sister groups to their geographic neighbors. The Kalamian micro-group, previously not considered a close relative of any other micro-group⁴⁸, is grouped with Palawanic with strong support (pp = 1.0). The Northern Mangyan languages have been considered as relatives of the Central Luzon group^49,50, but here we find strong support for a relationship with the Southern Mangyan languages, also from the island of Mindoro, in agreement with a lexicostatistical study⁵¹. Manide-Alabat, considered a deep branch of Philippines not closely related to any other¹⁷, is grouped with neighboring Tagalog along with Sinauna. Conversely Kagayanen, a geographically isolated outlier of the Manobo micro-group, does not group with other Manobo languages.

The dominant geographical signal in our results may partly be explained by an initial rapid Out-of-Taiwan dispersal producing a widely dispersed, undifferentiated Proto-Language. This scenario would be expected to initially produce a rake-like tree topology⁷, with later borrowing due to contact the only remaining signal in the data. This explains the recovery of the unusual groupings listed above.

In addition to the effects of borrowings, our results are also consistent with the hypothesis that the languages of the Philippines form a linkage^40,52. This linkage effect is shown by the unstable relationships in the deeper branches of the tree, best demonstrated by the changing position of Central Luzon, which is generally grouped with Northern Luzon, but also sometimes with Tagalog (Fig. 3).

Gorontalo-Mongondow languages contain different layers of cognate sets, reflecting a possible migration event followed by diffusion between neighboring languages. The evidence from cognate sets (Fig. 4) is consistent with the following scenario (Fig. 6). First, during the initial spread of MP languages in the Philippines, the Sangiric and Minahasan micro-groups split from the rest of the Philippine languages. From this initial period, Gorontalo-Mongondow languages retain a number of cognate sets that are innovations shared with all the other Philippine languages except Sangiric and Minahasan (Fig. 6A). Later, the languages in the Philippines proper underwent a broad North vs South/Central split, during which time Greater Central Philippines cognate sets were acquired (Fig. 6B). In the final stage, People speaking Proto-Gorontalo-Mongondow migrated to northern Sulawesi⁴⁴, following which cognate sets unique to the languages of northern Sulawesi spread by diffusion (Fig. 6C). The result is a conflict between different parts of the lexical data, some showing the historical signal, and others the geographic/diffusion signal. The latter pulls Gorontalo-Mongondow into an earlier branching part of the tree than it might otherwise be (Fig. 6, bottom panels), and leads to an unstable phylogenetic position (Fig. 3).

A similar migration-diffusion scenario is also a likely explanation for our inferred position of Kagayanen. Kagayanen is an outlier Manobo language, the only one not found on Mindanao. Although considered part of the Manobo subgroup, it also shows evidence of diffusion from Bisayan languages⁵³. This conflicting signal likely explains its anomalously deep branching position in the consensus tree (Fig. 2), although some support still exists for uniting it with Manobo.

Conflicting data produce alternative phylogenetic positions based on historical and diffusion signals; in the case of Gorontalo-Mongondow, we interpret these to be with Central Philippines or with Sangiric/Minahasan, respectively. However, the conflicting data can also lead to “compromise” phylogenetic positions. In our posterior sample of trees, Gorontalo-Mongondow can branch off anywhere along the phylogenetic path between Sangiric/Minahasan and Central Philippines, and the consensus tree shows an unresolved, “compromise” position (Fig. 2). We predict that languages that have undergone a significant migration event will often appear in unusually deep branching positions in phylogenies due to conflicting signals from related languages on one hand and neighboring languages on the other.

The proposed language leveling event at the origin of Philippine languages is potentially supported by the long branch between the root of the tree and the divergence of Philippine languages: the branch length has a median length of 1219 years (HPD 471–2093), approximately a quarter of the tree depth, and the divergence of Philippine languages is dated at 3379 BP (HPD 2570–4208). This potentially provides some support for the Proto-Philippines hypothesis, but analysis of a wider sample of Malayo-Polynesian than that used here is required to directly test Proto-Philippines using Bayesian phylogenetics. Although we find some support for a variant of the Greater Central Philippine subgroup of Philippine languages, there is no evidence for the long branch leading to this group that a language levelling event would predict. We therefore find no evidence of a second language-levelling episode.

Our analysis reveals no conclusive signal of contact from language switching in ‘Negrito’ populations. It has been hypothesized that these languages retain words from a pre-Austronesian substrate¹⁶, including many in core vocabulary. In a phylogenetic analysis, retained lexemes would appear as innovations occurring on the branch on which language switching is inferred to have occurred. A substantial number of imported lexemes from a pre-Austronesian language would result in a rapid rate of lexical change being estimated on that branch. Although we find no evidence for such an increase in rates, our analysis does not rule out such retentions occurring at low frequency, outside core vocabulary, or in grammatical features. We also found no evidence that all ‘Negrito’ populations switched to MP languages at a relatively early stage, as previously hypothesized¹⁸. Instead, language switching appears to have occurred at various times in different languages (Table 2). The lack of evidence for early switching to Austronesian by ‘Negrito’ speakers suggests that there may have been minimal contact between Austronesian and ‘Negrito’ populations during the initial Out-of-Taiwan migration.

A lack of evidence for contact in ‘Negrito’ languages raises the probability that switching to Austronesian languages involved wholesale adoption of the lexicon with little retention from previous languages. This has potential implications for the spread of MP languages more generally, given that genetic data hints at a possible pre-Austronesian presence of East Asian peoples in ISEA. Whether MP languages result from a mixture of language-switching and demographic expansion or a purely demographic expansion, our results suggest that these two hypotheses may be difficult to distinguish based on lexical data, since uncontroversial examples of language switching in ‘Negritos’ leave no obvious pattern.

Our study uses several features available in the BEAST2 framework for linking the results of the analysis directly to the data that ultimately support it. We applied topology tests to examine the strength of evidence for and against alternative positions for Gorontalo-Mongondow, and also probabilistically reconstructed lexical innovations occurring on branches of interest. These kinds of analyses add an extra dimension to Bayesian analysis of linguistic data, allowing assessments about the strength of the evidence supporting a particular hypothesis beyond the posterior probability values returned by standard analyses.

Overall, our results conclusively reject a simplistic North-to-South dispersal of Austronesian languages in the Philippines. Instead, we propose an initial rapid expansion from the south, followed by high levels of diffusion across language chains, including repeated language shifts from ‘Negrito’ to Austronesian. Our investigation of the data also reveals substantial effects of contact on the distribution of lexical cognates. In contrast, there is little evidence for secondary demographic expansion and language levelling events beyond a possible event at the origin of Philippine languages and the migration of Gorontalo-Mongondow. This suggests a dominant role for cultural diffusion in the Philippines following Austronesian expansion. Our implementation of several methods to scrutinize the results of our Bayesian analysis serve as a template for Bayesian analysis of linguistic data in future studies.

Materials and methods

We assembled a dataset of Philippine and Formosan languages from the Austronesian Basic Vocabulary Database⁴⁶, discarding any doculects with less than 80 cognate sets. The sample consisted of 202 doculects: 147 from the Philippines and an outgroup of 55 Formosan doculects. The data comprised 7565 cognate sets in 185 meanings and is curated at https://github.com/SimonGreenhill/abvd_philippines. A table showing the languages and their sources is included (SI Appendix, Table S1). The cognate coding for Formosan and Philippine languages has been extensively revised since the analysis of Gray, et al.⁴⁵ by L Reid (Philippines) and M Ross (Formosan).

The data were analyzed in BEAST2.7.3⁴⁷. We applied the covarion model^54,55, which allows cognates to switch between hidden states with slow and fast rates of change. We set the frequencies of the hidden states to (0.5, 0.5), the default in BEAST. The relative frequencies of states absent and present were set to their empirical values (0.992, 0.008). We applied an ascertainment bias correction because all-zero cognate sets are not observed⁵⁶, and applied this correction separately to each meaning to account for differing patterns of missing data. Cognate sets in the Austronesian Basic Vocabulary Database can be divided into “primary cognate sets” and “sub-cognate sets”, with the latter encoding sound changes that modify an underlying primary cognate set. These subcognate sets remain incompletely or inconsistently coded and were therefore removed. The remaining primary cognate sets were partitioned into bins based on the number of cognate sets in each meaning, with bins of 1–10, 11–20 etc. cognate sets. We applied a yule model tree prior with a lognormal prior (− 7.5, 2.0) on the birth rate and a lognormal relaxed clock with a lognormal prior (− 10.0, 2.0) on the mean branch rate and a gamma distribution (0.5396, 0.3819) on the standard deviation. The prior on the root of the tree was a normal distribution on 5200 years BP, with a standard deviation of 100, based on the results of Gray, et al.⁴⁵.

We ran the analysis for 100 million generation across 3 independent runs. Convergence was confirmed in Tracer⁵⁷, with ESS > 200 for all parameters. The log files from the 3 runs were combined following the removal of 10% burn-in.

We performed two types of tests to determine cognate sets supporting various topological relationships. The first was a topology test⁵⁸, to examine alternative phylogenetic positions for Gorontalo-Mongondow. We ran one analysis with a constraint enforcing Gorontalo-Mongondow to fall outside the languages of the Philippines proper, i.e. an early branching position close to Sangiric and Minahasan. We performed a second analysis with Gorontalo-Mongondow constrained to group with Central/Southern Philippine languages. We recorded the log likelihood of each cognate set throughout the MCMC in both the analyses. The Highest Posterior Density Interval of each cognate set was then calculated separately for each analysis. Cognate sets with non-overlapping 50% HPD intervals between the two analyses were then considered as significant, supporting one or the other phylogenetic position for Gorontalo-Mongondow.

In the second test, we estimated the lexical innovations occurring in the lineage leading to a group of interest (e.g. all Philippine languages excluding Sangiric and Minahasan). We performed ancestral state reconstruction of all cognate sets at the most recent common ancestor (MRCA) node of the group of interest, as well as at the immediately ancestral node (also known as the origin node). Cognate sets reconstructed as absent (covarion states 0 or 2) at the origin node and as present (covarion states 1 or 3) at the MRCA node can be designated as innovations, which were logged to an output file using a newly formulated beast2 java class (InnovationLogger) curated at https://github.com/king-ben/innovations. These reconstructions are performed at every sampled generation of the MCMC chain, thereby leading to a probabilistic estimation: the posterior probability of a particular innovation is the proportion of the sampled generations in the MCMC in which an innovation of a cognate is reconstructed.

Data availability

Data and analysis code are available at https://doi.org/10.6084/m9.figshare.22347916.

References

Bellwood, P. A hypothesis for Austronesian origins. Asian Perspect. 26, 107–117 (1984).
Google Scholar
Blust, R. The Austronesian homeland: A linguistic perspective. Asian Perspect. 26, 45–67 (1984).
Google Scholar
Blust, R. The Austronesian homeland and dispersal. Ann. Rev. Linguist. 5, 417–434 (2019).
Article Google Scholar
Blust, R. A. The proto-Austronesian pronouns and Austronesian subgrouping. Work. Pap. Linguist. Univ. Hawaii Honol. 9, 1–15 (1977).
Google Scholar
Bellwood, P. & Dizon, E. Z. in Past Human Migrations in East Asia: Matching Archaeology, Linguistics and Genetics (eds Alicia Sanchez-Mazas et al.) 23–39 (Routledge, Taylor & Francis Group, 2008).
Spriggs, M. Archaeology and the Austronesian expansion: Where are we now?. Antiquity 85, 510–528 (2011).
Article Google Scholar
Smith, A. D. The Western Malayo-Polynesian problem. Ocean. Linguist. 56, 435–490 (2017).
Article Google Scholar
Pawley, A. in Selected Papers from the Eighth International Conference on Austronesian Linguistics. (eds Elizabeth Zeitoun & Paul Jen-kuei Li) 95–138 (Academia Sinica).
Blust, R. Response to Comments on “The Resurrection of Proto-Philippines”. Ocean. Linguist. 59, 450–479 (2020).
Article Google Scholar
Blench, R. in Austronesian Diaspora: A New Perspective (eds Bagyo Prasetyo, Titi Surti Nastiti, & Truman Simanjuntak) 77–104 (Gadjah Mada University Press, 2021).
Cochrane, E. E., Rieth, T. M. & Filimoehala, D. The first quantitative assessment of radiocarbon chronologies for initial pottery in Island Southeast Asia supports multi-directional Neolithic dispersal. PLoS ONE 16, e0251407 (2021).
Article CAS PubMed PubMed Central Google Scholar
Détroit, F. et al. Upper Pleistocene Homo sapiens from the Tabon cave (Palawan, The Philippines): Description and dating of new discoveries. C. R. Palevol 3, 705–712 (2004).
Article Google Scholar
Détroit, F., Corny, J., Dizon, E. Z. & Mijares, A. S. “Small size” in the Philippine human fossil record: Is it meaningful for a better understanding of the evolutionary history of the Negritos?. Hum. Biol. 85, 45–66 (2013).
Article PubMed Google Scholar
Mijares, A. S. et al. New evidence for a 67,000-year-old human presence at Callao Cave, Luzon, Philippines. J. Hum. Evol. 59, 123–132 (2010).
Article PubMed Google Scholar
Lobel, J. W. Philippine and North Bornean Languages: Issues in Description, Subgrouping, and Reconstruction, University of Hawai’i (2013).
Reid, L. A. Possible non-Austronesian lexical elements in Philippine Negrito languages. Ocean. Linguist. 33, 37–72 (1994).
Article Google Scholar
Lobel, J. W. Manide: an undescribed Philippine language. Ocean. Linguist. 49, 478–510 (2010).
Article Google Scholar
Reid, L. A. Who are the Philippine negritos? Evidence from language. Hum. Biol. 85, 329–358 (2013).
PubMed Google Scholar
Liu, D. et al. Extensive ethnolinguistic diversity in Vietnam reflects multiple sources of genetic diversity. Mol. Biol. Evol. 37, 2503–2519 (2020).
Article CAS PubMed PubMed Central Google Scholar
Peng, M.-S. et al. Tracing the Austronesian footprint in Mainland Southeast Asia: A perspective from mitochondrial DNA. Mol. Biol. Evol. 27, 2417–2430 (2010).
Article CAS PubMed Google Scholar
Hill, C. et al. A mitochondrial stratigraphy for island southeast Asia. Am. J. Hum. Genet. 80, 29–43 (2007).
Article CAS PubMed Google Scholar
Lipson, M. et al. Reconstructing Austronesian population history in Island Southeast Asia. Nat. Commun. 5, 4689 (2014).
Article ADS CAS PubMed Google Scholar
Pugach, I. et al. The complex admixture history and recent southern origins of Siberian populations. Mol. Biol. Evol. 33, 1777–1795 (2016).
Article CAS PubMed PubMed Central Google Scholar
Oliveira, S. et al. Ancient genomes from the last three millennia support multiple human dispersals into Wallacea. Nat. Ecol. Evolut. 6, 1024–1034 (2022).
Article Google Scholar
Alam, O. et al. Genome analysis traces regional dispersal of rice in Taiwan and Southeast Asia. Mol. Biol. Evol. 38, 4832–4846 (2021).
Article CAS PubMed PubMed Central Google Scholar
Larena, M. et al. Multiple migrations to the Philippines during the last 50,000 years. Proc. Natl. Acad. Sci. 118, e2026132118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Blench, R. Was there an Austroasiatic presence in Island Southeast Asia prior to the Austronesian expansion?. Bull. Indo-Pacif. Prehist. Assoc, 30, 133–144 (2010).
Google Scholar
Donohue, M. & Denham, T. Farming and language in Island Southeast Asia: Reframing Austronesian history. Curr. Anthropol. 51, 223–256 (2010).
Article Google Scholar
Adelaar, K. A. in The Austronesians (eds P Bellwood, J J Fox, & D Tryon) 81–102 (Australian National University, 1995).
Reid, L. A. Modeling the linguistic situation in the Philippines. Senri Ethnol. Stud. 98, 91–105 (2018).
Google Scholar
Bellwood, P. Prehistory of the Indo-Malaysian Archipelago (University of Hawai’i Press, 1997).
Book Google Scholar
Barron, A. et al. Sherds as archaeobotanical assemblages: Gua Sireh reconsidered. Antiquity 94, 1325–1336 (2020).
Article Google Scholar
Bellwood, P. & Dizon, E. The Batanes archaeological project and the “Out of Taiwan” hypothesis for Austronesian dispersal. J. Austron. Stud. 1, 1–33 (2005).
Google Scholar
Zorc, R. D. in FOCAL II: Papers from the Fourth International Conference on Austronesian Linguistics (eds Paul Geraghty, Lois Carrington, & S A Wurm) 147–173 (Pacific Linguistics, 1986).
Blust, R. in Selected Papers from the Eighth International Conference on Austronesian Linguistics. (eds Elizabeth Zeitoun & Jen-Kuei Paul Li) 31–94 (Academia Sinica).
Blust, R. The resurrection of Proto-Philippines. Ocean. Linguist. 58, 153–256 (2019).
Article Google Scholar
Blust, R. in Current Issues in Philippine Linguistics and Anthropology: Parangal Kay Lawrence A. Reid (eds Hsiu-chuan Liao & Carl R Galvez Rubino) 31–68 (The Linguistic Society of the Philippines and SIL Philippines, 2005).
Blust, R. Chamorro historical phonology. Ocean. Linguist. 39, 83–122 (2000).
Article Google Scholar
Ross, M. The Batanic languages in relation to the early history of the Malayo-Polynesian subgroup of Austronesian. J. Austron. Stud. 1, 1–24 (2005).
ADS Google Scholar
Ross, M. Comment on Blust “The Resurrection of Proto-Philippines”. Ocean. Linguist. 59, 366–373 (2020).
Article Google Scholar
Charles, M. Problems in the reconstruction of Proto-Philippine phonology and the subgrouping of the Philippine languages. Ocean. Linguist. 13, 457–509 (1974).
Article Google Scholar
Liao, H.-C. A Reply to Blust (2019) “The Resurrection of Proto-Philippines”. Ocean. Linguist. 59, 426–449 (2020).
Article Google Scholar
Chen, V., Gallego, K., Kuo, J., Stead, I. & van der Voorn, B. Contact or inheritance? New evidence on the Proto-Philippines debate. Zenodo https://doi.org/10.5281/zenodo.11032274 (2024).
Blust, R. The greater central Philippines hypothesis. Ocean. Linguist. 30, 73–129 (1991).
Article Google Scholar
Gray, R. D., Drummond, A. J. & Greenhill, S. J. Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science 323, 479–483 (2009).
Article ADS CAS PubMed Google Scholar
Greenhill, S. J., Blust, R. & Gray, R. D. The Austronesian basic vocabulary database: From bioinformatics to lexomics. Evolut. Bioinform. 4, 271–283 (2008).
Google Scholar
Bouckaert, R. et al. BEAST 2.5: An advanced software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 15, e1006650 (2019).
Article CAS PubMed PubMed Central Google Scholar
Himes, R. S. The Kalamian microgroup of Philippine languages. Stud. Philipp. Lang. Cult. 15, 54–72 (2007).
Google Scholar
Himes, R. S. The Central Luzon group of languages. Ocean. Linguist. 51, 490–537 (2012).
Article Google Scholar
Zorc, R. D. Internal and external relationships of the Mangyan languages. Ocean. Linguist. 13, 561–600 (1974).
Article Google Scholar
Walton, C. A Philippine language tree. Anthropol. Linguist. 21, 70–98 (1979).
Google Scholar
Zorc, R. D. Reactions to Blust’s “the resurrection of proto-Philippines”. Ocean. Linguist. 59, 394–425 (2020).
Article Google Scholar
Harmon, C. J. W. Kagayanen and the Manobo Subgroup of Philippine Languages. PhD Dissertation, University of Hawaii (1977).
Tuffley, C. & Steel, M. Modeling the covarion hypothesis of nucleotide substitution. Math. Biosci. 147, 63–91 (1998).
Article MathSciNet CAS PubMed Google Scholar
Penny, D., McComish, B. J., Charleston, M. A. & Hendy, M. D. Mathematical elegance with biochemical realism: The covarion model of molecular evolution. J. Mol. Evol. 53, 711–723 (2001).
Article ADS CAS PubMed Google Scholar
Felsenstein, J. Phylogenies from restriction sites: A maximum-likelihood approach. Evolution 46, 159–173 (1992).
PubMed Google Scholar
Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).
Article CAS PubMed PubMed Central Google Scholar
King, B. Which morphological characters are influential in a Bayesian phylogenetic analysis? Examples from the earliest osteichthyans. Biol. Lett. 15, 20190288 (2019).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Robert Blust for cognate coding, and Isaac Stead and Russell Barlow for discussions.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Department of Linguistic and Cultural Evolution, Max Planck Institute for Evolutionary Anthropology, 04103, Leipzig, Germany
Benedict King, Mary Walworth & Russell D. Gray
School of Biological Sciences, University of Auckland, Auckland, 1142, New Zealand
Simon J. Greenhill
National Museum of the Philippines, 1000, Ermita, Manila, Metro Manila, Philippines
Lawrence A. Reid
Department of Linguistics, University of Hawaiʻi at Mānoa, Honolulu, HI, 96822, USA
Lawrence A. Reid
School of Culture, History and Language, Australian National University, Canberra, 2601, Australia
Malcolm Ross
School of Psychology, University of Auckland, Auckland, 1142, New Zealand
Russell D. Gray

Authors

Benedict King
View author publications
You can also search for this author in PubMed Google Scholar
Simon J. Greenhill
View author publications
You can also search for this author in PubMed Google Scholar
Lawrence A. Reid
View author publications
You can also search for this author in PubMed Google Scholar
Malcolm Ross
View author publications
You can also search for this author in PubMed Google Scholar
Mary Walworth
View author publications
You can also search for this author in PubMed Google Scholar
Russell D. Gray
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

RG, SJG and MW conceived the project. LR and MR cognate coded the ABVD data. BK and SJG analysed the data. BK wrote the paper with input from all authors.

Corresponding author

Correspondence to Benedict King.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

King, B., Greenhill, S.J., Reid, L.A. et al. Bayesian phylogenetic analysis of Philippine languages supports a rapid migration of Malayo-Polynesian languages. Sci Rep 14, 14967 (2024). https://doi.org/10.1038/s41598-024-65810-x

Download citation

Received: 09 August 2023
Accepted: 24 June 2024
Published: 28 June 2024
DOI: https://doi.org/10.1038/s41598-024-65810-x
Springer Nature Limited

Bayesian phylogenetic analysis of Philippine languages supports a rapid migration of Malayo-Polynesian languages

Abstract

Similar content being viewed by others

Phylogenetic evidence for Sino-Tibetan origin in northern China in the Late Neolithic

Dated phylogeny suggests early Neolithic origin of Sino-Tibetan languages

The origin and expansion of Pama–Nyungan languages across Australia

Introduction