Introduction

Cajanus cajan (L.) Millsp. has, until fairly recently, received relatively little research attention despite its importance in India and other countries worldwide. Colloquially known as the ‘orphan crop’ or ‘poor people’s meat’ due to its high protein content, Cajanus cajan (L.) Millspaugh (Syn.: Cajanus indicus Spreng.; Cajanus flavus DC.; Cytisus cajan L.) from the family Fabaceae, is more commonly known as pigeonpea. It is an important grain legume, particularly in rain-fed agricultural regions in the semi-arid tropics, as well as an excellent, high-protein cover/forage for livestock (Duke 1983; Pal et al. 2011; Randhawa 1958) which can be intercropped with sorghum and/or millets (Shetty and Rao 1981). The genus Cajanus has 32 species (Mallikarjuna et al. 2012, 411) with 18 occurring in India (Mallikarjuna et al. 2011). Despite some claims for an African origin (Watt 1889; Purseglove 1976) it has been convincingly demonstrated that the likely wild progenitor was Cajanus cajanifolius found today in eastern India (van der Maesen 1986), including the modern state of Odisha and adjacent states, an origin earlier suggested by Haines (1921-1925). This is reinforced by modern genetic data (Kassa et al. 2012). Today the largest producer of pigeonpea is India (Pal et al. 2016; Nwokolo 1996), but it is also found across large parts of Southeast Asia, Africa and the Caribbean where it is commonly known as congo pea, or gungo pea.

As a subsistence pulse crop C. cajan contains high levels of protein and amino acids such as methionine, lysine and tryptophan and is an important source of dietary vitamins and minerals particularly B vitamins and is therefore especially important for people living on subsistence diets (Oshodi et al. 2009). It is grown in large quantities in modern India (Fig. S1a), and shows increasing areas of cultivation (Fig. S1b). Pigeonpea seeds are found in a huge diversity of flavours (from bitter to sweet) and colours (from black to creamy white) (Upadhyaya et al. 2005). Pigeonpea is most commonly used to make ‘dhal’ (soaked dried, hulled, and split seeds) (Shinde et al. 2017) and in parts of Southeast Asia the seeds are used instead of soya bean to make tempe or tofu (Shurtleff and Aoyagi 2013; Owens et al. 2015; van der Maesen and Somaatmadja 1989) and noodles in Myanmar (APO 2003). Immature green pigeonpeas can also be harvested and cooked as a fresh vegetable, which is more common in the Caribbean and Southeast Asia. The leaves from this plant are considered an excellent fodder for cattle and the dry wood is considered a good fuel (Watt 1889). Thus, largely all parts of the pigeonpea plant are utilized and integrated into daily use. A number of potential medicinal uses have also been explored (e.g. Allen and Allen 1981; Al-Saeedi and Hossain 2015; Uchegbu and Ishiwu 2016).

Botanically, it is an annual or erect short-lived perennial shrub, measuring 1–2 m in height, with a variable habit. Modern day populations grow in a range of soil types and climates, throughout tropical and subtropical regions of the world from South Asia to Australia (Khoury et al. 2015; Kassa et al. 2012). It thrives with an annual rainfall of 600–1000 mm yet is also drought tolerant and can be grown in areas with < 300 mm rainfall (Kingwell-Banham and Fuller 2013). Hence, it is an important crop for small-scale farmers in semi-arid areas where rainfall is low or variable. C. cajan has an extensive habitat range throughout India growing at altitudes to over 1800 m (Watt 1889). Despite adaptation to versatile environmental conditions, crop productivity has remained stagnant for almost the last 5 decades at production levels of roughly 750 kg/ha (Fig. S1a) (Bohra et al. 2012).

Cultivated types can be grouped into two varieties. C. cajan (L.) Millsp., var. bicolor DC., Hindi arhar, a late maturing, large, bushy plant which normally takes between 6 and 11 months to reach maturity, and a short-season variety, C. cajan, var. flavus DC., Hindi tuvar, which can reach maturity more rapidly, within only 3–4 months (Kingwell-Banham and Fuller 2013). Whether or not these two varieties may be phylogenetic subspecies, they are useful varietal groups as they correspond to different cultivation regimes (season length) and different colloquial names (see below). Although the arhar types are regarded as more primitive and closer to the perennial ancestor (Smartt 1990), both varieties have a long history in excess of 2000 years, judging by historical linguistic evidence (see Supplement section), suggesting that this differentiation took place early in the evolution of this crop. C. cajan has one of the highest yields per area in comparison with other South Asian pulses (Fig. S1). In contrast, archaeologically, the recovery of any Cajanus sp. is quite low in comparison with other pulses from South Asian archaeological sites (Fuller and Harvey 2006; Harvey et al. 2006; Fuller and Murphy 2018; Murphy and Fuller 2016; Smartt 1990), suggesting that perhaps Cajanus use was more limited in the past and that it may have been domesticated later than some other legume taxa in India. Cajanus spp. are likewise found in very few sites in Southeast Asia compared to another South Asian pulse, Vigna radiata (L.) Wilczek, that is widely found (Castillo et al. 2016, 2018a).

Recent genetics, historical linguistics and evidence for origins

Cajanus cajanifolius (Haines) Maesen is accepted as the progenitor species of cultivated C. cajan (L.) Millsp. (van der Maesen 1986; Smartt 1990; Mallikarjuna et al. 2012; Sinha et al. 2015). Domesticated C. cajan possesses 75% less allelic diversity than the progenitor clade of wild Indian species, suggesting a severe “domestication bottleneck” (Saxena et al. 2014; Al-Saeedi and Hossain 2015). Hence, pigeonpea’s improvement is increasingly reliant on introgression with wild forms with their diversity of phenotypic traits, practice that would benefit from knowledge of its domestication history and early selection from the wild species (Kameswara Rao et al. 2003; Kassa et al. 2012; Pandey et al. 2008; Upadhyaya et al. 2007, 2013). This is especially important as habitat loss threatens wild Cajanus spp. populations (Khoury et al. 2015; Sahai and Rawat 2015).

The highest level of polymorphism in wild relatives and landraces were found within the states of Madhya Pradesh and Andhra Pradesh, leading Saxena et al. (2014) to infer domestication somewhere along India’s eastern coast (e.g. Andhra Pradesh), further south than the Odishan origins inferred by Fuller and Harvey (2006) on the basis of Van der Maesen’s (1986) wild distribution map. Archaeobotanical evidence should ultimately provide more solid evidence.

Historical linguistic evidence can also inform on the origins of pigeonpea. A compilation of names for pigeonpea across numerous languages in India and beyond is provided in Supplementary Information and Tables S1 and S2. Ancient linguistics in India early on differentiated the long growth cycle varieties (var. bicolor), from short-cycle varieties (var. flavus). We can infer two early roots for the long cycle forms, one from early Indic (Indo-European) and one from Dravidian. The first is the source of Hindi arhar, derived from ancient Prakrit adhai, with loan word names evident in some Austroasiatic (Munda) languages in eastern India as well as some Southeast Asia (e.g. Thai hae) and distant west African languages, like Togo and Hausa. An alternative source was the reconstructed Early Dravidian form *kar-unti (Southworth 2005), the source of various derivative names based on the element kan or gan, including some Southeast Asian names, such as in Burmese and Malay, as well as some southeastern African names, such as Malawi kardis. For short-cycle field crops there is a single widespread cognate set found in both the reconstructions of Early Dravidian (*tu-var-) and Old Indo-Aryan (*tubarī-). These shared terms suggest the evolution of short cycle varieties may have taken place near where these language families overlap geographically, namely around Odisha, Chattisgarh and/or northeast Andhra Pradesh, which is likely to be in or near zone for the domestication and early evolution of domesticated Cajanus cajan.

Materials and methods

Archaeobotany

The present study adopts a fourfold approach of examining three different lines of evidence. First, we surveyed herbarium specimens of likely wild progenitor populations held in herbarium collections from Royal Botanic Gardens, Kew and the Natural History Museum, London (NHM) (Table 1). These provided potential augmentation to the distribution of wild populations mapped by van der Maesen (1986) which can be combined with the geographic distribution of climatic conditions similar to where C. cajanifolius has been found. Second, we took measurements to provide an extensive morphometric baseline for seed size in modern domesticated and wild pigeonpea (Table 2), which provides a basis from which to infer the domesticated status of archaeological pigeonpea based upon seed measurements. Third, we recorded measurements of archaeological pigeonpea, both of specimens in our collections and those in the published literature (Table 3). These provide a time series of seed size data for regional populations, especially for the Deccan and South India, which allows us to trace the evolution of seed size as one aspect of the domestication syndrome in this species. Lastly, we provide an updated database on the archaeological occurrence of pigeonpea in time and space, allowing us to infer the region(s) in which it first occurred in the human diet and/or cultivation systems in prehistory (Table 3, Fig. 4).

Table 1 Cajanus accessions examined from Natural History Museum, London, UK
Table 2 Modern Cajanus seeds measured

Initially, herbarium collections for Cajanus were surveyed including those from South Asia and from Africa with the kind permission of the Natural History Museum, London (Fig. 1). This included specimens filed under genus Atylosia (synonymous to Cajanus). In addition, where seeds were visible on the herbarium specimen, or loose in attached pouches, these were measured, both for wild and cultivated populations, including wild populations from Africa; their seed metrics contribute to a baseline for the size range of wild seeds (Supplementary Table S4).

Fig. 1
figure 1

Herbarium specimen of Cajanus cajanifolia (filed as Atylosia cajanifolia), (The Natural History Museum, London)

Seed measurements were also taken on modern crop populations of C. cajan from several reference collections including the UCL archaeobotany reference collection, augmented with additional germplasm obtained from the USDA, as well as from the Economic Botany collection of Royal Botanic Gardens, Kew (Table 3 and Supplementary Table S4). Archaeological specimens from South Asian and Southeast Asian sites were also measured from our collections, as well as some compiling of data available from published literature (Table 3 and Supplementary Table S5). Archaeobotanically, pigeonpea identification is aided by a distinctive apostrophe-shaped shoot bud (plumule) within the embryo that is often visible on the charred split cotyledon as an imprint (Fig. 2).

Table 3 Measured archaeological specimens of Cajanus
Fig. 2
figure 2

Archaeological examples of Cajanus cajan. a Example from Neolithic Sanganakallu, Karnataka, interior of cotyledon with plumule visible (from Fuller 1999). b Example from Chalcolithic Gopalpur, Odisha drawn by DQF, from Harvey et al. (2006); c Example from Early Historic Paithan, Maharashtra, with plumule highlighted (Photo by C). d Example from Terrace of the Leper King, Angkor Thom, Cambodia (Photo by CCC). (Color figure online)

Results

The known distribution of modern Cajanus reveals a limited latitudinal distribution of the ‘wild’ sister species and comparatively broad distribution of modern day domesticates of Cajanus (Fig. 3). Cajanus cajanifolius is clustered in eastern India in what is now the modern state of Odisha (formerly Orissa). What is notable is that whilst Cajanus spreads outside of its native habitat in South Asia into Southeast Asia it does not cross the ecological boundary of the Himalayas. In China several southern provinces have modern populations of Cajanus cajan, mentioned in floristic sources (Hu 2005; Ren and Gilbert 2010) and noted in our herbarium survey, but none of these appears to be cultivated. These occur as “wild” or free-growing populations, but near human disturbance. Our observations suggest that these have dehiscent pods, rather than domesticated type non-dehiscent pods. Other characteristics resemble C. cajan, suggesting that these populations should be regarded as feral, representing past “escapes” from cultivation. This then implies that at some time in the past pigeonpea was cultivated across parts of Southern China as far east as Taiwan, and that cultivation has been abandoned subsequently. The major transformations of agriculture allowed by the introduction of New World taxa, like maize and Phaseolus and Canavalia beans might have altered the attractiveness of Cajanus cultivars in some regions.

Fig. 3
figure 3

(Map created using QGIS v. 2.12.3)

Map displaying all the known modern range of regular Cajanus cajan cultivation in the Old World, areas of inferred former cultivation based on feral populations in China and sites cultivation and/or feral population in Island Southeast Asia (blue circles beyond shaded area). The distribution of wild progenitors (yellow stars), augmented from van der Maesen 1986 by this study.

Archaeobotanical evidence for Cajanus is richest for India, with a few finds from mainland Southeast Asia (Fig. 4). As is evident, most of these finds lie outside the likely zones of domestication around the Southern Odisha, Northern Andhra, and eastern Maharashtra borders. In addition, the earliest finds to date come from the South Indian Neolithic at Piklihal and Sanganakallu, implying earlier cultivation and dispersal of this species prior to 1650 BC.

Fig. 4
figure 4

Map displaying all the known archaeological specimens of Cajanus cajan in relation to the modern wild distribution of Cajanus cajanifolius (after van der Maesen 1986, and this study). Archaeobotanical finds of Cajan cajan or cf. Cajanus cajan. Sites numbered: 1. Hallur, 2. Piklihal, 3. Kadebakele, 4. Sanganakallu, 5. Peddamudiyam, 6. Nevasa, 7. Paithan (2 phases), 8. Bhokardan, 9. Bhon, 10. Paturda, 11. Kholapur, 12. Tuljapur Garhi, 13. Kaundinyapur, 14. Bhagimohari, 15. Mahurzkari (2 phases), 16. Charda, 17. Golbai Sassan, 18. Gopalpur, 19. Vikrampura, 20. Wari Bateshwar, 21. Phu Khao Tong, 22. Khao Sam Kaeo, 23. Angkor Thom Terrace of the Leper King

A scatter plot of the length and width measurements of all the modern specimens, including wild species of Cajanus, shows a separation of the domesticated types from the wild progenitor (C. cajanifolius) and other congeneric wild taxa (Fig. 5). A great deal of intra-species variation is present within the domesticated C. cajan specimens. There are broadly two forms of domesticated seeds, those with a low Length/Width ratio, i.e. with small but “tall” seeds (lower left in Fig. 5 (L/W ratio 0.8–1) and those with large and long seeds (i.e. with L/W ratios > 1 and with width > 4). Although wild specimens we have been able to measure are limited, they all appear to have low L/W ratios that fall between 0.68 and 0.8- and with width of < 44 mm (Fig. 6). This indicates that seed L/W ratio appears be a useful way to determine whether seeds are domesticated or wild. In addition, seed Length appears to have increased significantly, on average, in domesticated C. cajan, whereas seed width may not have (Fig. 7). A t test of Width indicated no significant difference in the mean width, whereas a t-test of Length is significant (p = 8.05 × 10−11); a Kolmogorov–Smirnov test for equal distributions in the Length of C. cajan versus C. cajanifolius indicates significantly different distributions (p = .0001, including Monte Carlo permutations). This suggests that we would expect to see an increase in seed Length over time during the domestication process as well as an increase in L/W ratio.

Fig. 5
figure 5

Scatterplot of modern measurements of length (mm) of Cajanus. The black line represents the estimated division between domesticated and wild Cajanus

Fig. 6
figure 6

Histogram of seed Length:Width ratios in modern Cajanus cajan and C. cajanifolius

Fig. 7
figure 7

Boxplots of seed length and seed width in modern Cajanus cajan and its wild progenitor C. cajanifolius

It is well-known that archaeological seeds, preserved by charring, undergo shrinkage, and this is often estimated to be on the order to 10–20%, with 20% shrinkage used to estimate shrinkage in pulses (e.g. Fuller and Harvey 2006; Fuller and Murphy 2018) This leads to the inference that a minimum Length for charred domesticated specimens should be around 4 mm or more (based on the 25th percentile on modern material); whereas the upper end estimated from modern wild seeds is around 3 mm (based on the maximum in modern material). Although actual shrinkage will vary based on carbonization conditions in the past, which are difficult to estimate, this is unlikely to affect the ratios of grain dimensions and thus L/W ratios are suggested to be a useful method for determining domestication status while a time series of Length (and less likely width) may also provide addition support and document change that evolved during or after domestication.

Turning to the available archaeological finds, it is clear that pigeonpea was established outside its wild distribution by ca. 1500 BC (Fig. 4). Measurements indicate the L/W ratios fall within the expected domesticated range and do not show any significant changes through time among the archaeological materials (Fig. 8). This suggests that domestication took place prior to the available archaeological finds, i.e. before 1650 BC. When seed size data are plotted against time (Fig. 9) it is also clear that most of these fall in the > 3 mm length and therefore outside the predicted wild size. Nevertheless, the earliest specimens available, including one seed from Piklihal, Karnataka, fall below this size and could represent the very end of the trend towards size increase at the tail end of the domestication process (dashed line in Fig. 9). The limited sample size does not allow for this trend to be regarded as statistically significant however, and further archaeological finds are necessary for the domestication process to be studied in this pulse.

Fig. 8
figure 8

Plot of seed Length/Width ratio for measured archaeological specimens plotted against estimated median age of specimens. Note ratios above 0.82 are expected to represent domesticated specimens. Currently available archaeological specimens all appear to be domesticated

Fig. 9
figure 9

Plots of archaeological Cajanus cajan seed dimensions over time. Dashed line represents an interpretation of size trajectory indicated by site and phase assemblages plotted in terms of mean and standard deviation (lines), together with maximum (+) and minimum (−). When only individual specimens are available, they are plotted without lines

Conclusions

This review of the available evidence for pigeonpea (Cajanus cajan) suggests that it was domesticated in ancient India with a long history of use in South Asia. Evidence of extant wild populations, including herbarium specimens surveyed by the authors, suggests a wild distribution in the hills of the northeast peninsula and along the northern east coast of India, especially in the state of Odisha (as per van der Maesen 1986; Saxena et al. 2014), but plausibly extending southwards to Andhra Pradesh. A focal region in which to seek the earliest cultivation is perhaps near the borders of the modern states of Odisha, Telangana, Maharashtra, Andhra Pradesh and Chhattisgarh. These areas are largely under-studied archaeologically. At present, archaeological finds are earliest in the Southern peninsula (i.e. Sanganakallu, Karnataka), distant from wild habitats, and these are older than from Chalcolithic sites in coastal central Odisha. Both groups of finds are of similar size, and have L/W ratios and Lengths that place them with domesticated populations rather than wild populations. This indicates that the earlier cultivation and domestication process is not represented in archaeological finds available to date and places this process prior to 1650–1700 BC. We have suggested that the end of a trend towards increasing seed size might be represented among the earlier archaeological finds available, although more data are needed to assess whether this a true and statistically significant trend, and to determine when and where it began. Comparisons with the timing of domestication processes documented in other pulses (e.g. Fuller et al. 2014; Murphy and Fuller 2017) suggest that cultivation should have begun around 1000 years earlier than the currently documented end of the process; we thus infer cultivation was likely to have started 5000–4500 years ago.

This calls for further archaeobotanical data to clarify the contexts of initial cultivation and domestication. Domestication would have increased the yields of pigeonpea and made this an increasingly attractive pulse. Nevertheless, on no archaeological site yet sampled does C. cajan dominate the pulse component of the assemblage, suggesting that it was not as predominant in many ancient diets as it is in the present day.

There is no mention of pigeonpea among the plants encountered by a nineteenth century French naturalist in the Mekong valley (Thorel 2001). Despite its absence in the literature archaeological evidence, however, has recently attested to its presence in the fourteenth to fifteenth century in central Cambodia at Angkor Thom (Castillo et al. 2018a), with earlier Iron Age occurrences (4th–1st c BC) in southern Thailand (Castillo et al. 2016). Archaeobotanical evidence for cultivation in Medieval Cambodia (Castillo et al. 2018a) along with the presence of widespread feral populations in anthropogenic habitats (Fig. 3; Hu 2005; van der Maesen 1980, 257–258), suggest that it may have formerly been more widely cultivated in southern China, and probably also Island Southeast Asia. Traditional cultivation in parts of Burma, Thailand, Laos and the Malay Peninsula (Burkill 1966) suggest that its distribution in Southeast Asia may have become more restricted to upland and rainfed agricultural regions, rather than areas heavily committed to irrigated rice; although the lowland plains of Southeast Asia originally focused on rainfed rice with transitions to wet rice generally thought to occur in the early centuries AD (Castillo and Fuller 2010; Castillo et al. 2018b). Similarly, it has limited cultivation in the mountains of Oman (Gebauer et al. 2007), where it likely diffused alongside other crops from India and where it fits with summer rainfall cultivation. Some of the reduction in cultivation regions in recent centuries, such as across southern China, may be due to intensification of other crops like rice and introduction of new pulses such as Phaseolus and Canavalia from the New World.

The wild progenitor is also understudied. It often appears in anthropogenic habitats, as well as feral outside its likely origin. As with another native South Asian pulse, horsegram (Macrotyloma uniflorum) (Fuller and Murphy 2018), the native habitat for wild populations of Cajanus appears to be disappearing and this presents a critical issue as genetic interbreeding programs are needed to continue to improve the current domesticated species of pigeonpea for future use. Hence, looking to potential areas of wild progenitor stock for future botanical and genetic studies of both wild and domesticated pigeonpea populations in South Asia should be undertaken. Archaeobotanical evidence has the potential to shed light on how and where it was cultivated in the past, including regions where it is no longer a crop, but where it therefore has potential for future reintroduction and development.