Exploring Linguistic Diversity in India: A Spatial Analysis
In the context of the modernization debate, social scientists have argued that there is a negative association between linguistic diversity within the country and economic development. India is a land characterized by “unity in diversity” amidst a multicultural society. This is represented by variety in culture such as different languages religions, castes, house types, dance forms, and dietary patterns (Noble and Dutt, India: cultural patterns and processes. Westview Press, Boulder, 1982). Of these cultural traits, language is an important instrument of cultural identity since it is through this medium that different groups of people identify and communicate with one another and the world and express a sense of identity to a place. Often social tensions emerge when a certain segment of society feels ostracized from social and economic processes of development due to lack of knowledge of the dominant and prevalent language. Also, it is argued that the most linguistically diverse states in India are more literate and highly educated and have a positive sex ratio. Given this background, this research addresses the following three research questions: (1) What is the extent of linguistic diversity in India during 1971–2001 decades? (2) What factors explain the geographical patterns in linguistic diversity in India? (3) Are linguistic diversity patterns symbiotically related? This study utilizes spatial analytic techniques such as index of diversity and GIS analysis. The data on language was collected from Census of India for analysis.
KeywordsIndia Languages Scheduled and nonscheduled languages Index of diversity
The issue of the relationship between linguistic diversity and development has been an age-old theme. This theme has implications for language planners and policy-makers for several decades. In this context Nettle (1998) explored the geographical analysis of language and climate variability in the global hemisphere. He identified two belts of high language diversity: one runs through West and Central Africa and another runs through South and Southeast Asia and the Pacific region. Geographical areas characterized by relatively less climate variability and sufficient food production allow small groups of people to sustain, and hence, several languages developed. On the contrary areas with high climate variability necessitates large size of social network to sustain a population and hence less variation in languages emerges.
In the context of the modernization debate, social linguists have argued that there is a negative correlation between linguistic diversity within the country and economic development. This quantitative association is often termed the Fishman-Pool hypothesis (Nettle 2000) after the work done by Joshua Fishman (1968) and Jonathan Pool (1972). Yet another corollary to this thinking has been the notion that linguistic diversity within the nation threatens political stability. Several strands of thinking have emanated along this theme. Geertz (1973) professes that ethnolinguistic identity was a primordial attachment and hence construed as an independent variable in studies related to language diversity and development. In contrast to the primordial view, instrumentalists (Brass 1991) believed that language diversity was best viewed as a dependent variable. It was assumed that linguistic homogeneity was the best rational outcome with the nation managed by leaders with national interest. A middle ground was struck by constructivists (Brown 1989) who argued that language and ethnolinguistic identity were social constructs which cannot be easily subjected to statistical analysis, and thus, a qualitative analysis is necessary such as a case-by-case study. In this context Sonntag (2017) examined the qualitative and case-by-case study of India, Pakistan, Sri Lanka, and Nepal and concluded that political historical context matters in the interpretation of the relationship among language diversity, conflict, and development. If language diversity was negatively associated with development, Sri Lanka which followed a monolithic language regime (like Nepal and Pakistan) should have higher levels of development. India espoused a multilingual structure through state reorganization along the lines of linguistic federalism. However, Nettle (2000) argues for resisting the temptation to conclude that linguistic diversity is causally linked with economic performance.
India is a land characterized by “unity in diversity” amidst a multicultural society. This is symbolized by variety in culture such as different languages, religions, castes, house types, dance forms, and dietary patterns (Noble and Dutt 1982). Of these cultural traits, language is an important instrument of cultural identity since it is through this medium that different groups of people communicate with the world and express a sense of identity to a place. Often social tensions emerge when a certain segment of society feels ostracized from social and economic processes of development due to lack of knowledge of the dominant and prevalent language. This often leads to granting linguistic minorities special privileges to accommodate them in the process of mainstream national social and economic development.
Twenty-two scheduled languages of India, 2001
India is characterized by an astounding degree of linguistic diversity since each state has adopted 1 or 2 of the 22 official languages for conducting business and governance, and yet a few of the other major or minor languages could still be spoken by various population cohorts in different states. The diversity in Indian languages could be explained historically by the presence of numerous ancient kingdoms, each with its own language. Although the states were disintegrated and merged with other states due to territorial expansions and wars, the language spoken by the people remained intact. Later, during colonial rule, new state lines were drawn which did not follow the former political or linguistic boundaries. Further, Srinivas (2018) opined an increase by 45% in the population group that reported Hindi language as their mother tongue during the decade 2001–2011. These population groups have migrated due to economic reasons from the poor states of Bihar and Uttar Pradesh in the North to rich Southern states of Tamil Nadu, Kerala, Karnataka and Maharashtra, and Gujarat in Western India. This reflects a changing linguistic landscape in India.
Given this overview the purpose of this study is to explore the relationship between language and geography and the interconnected relationships among them. This study explores the impact of relative distance and location on the diffusion and development of Indian languages across the geographical landscape. The predominant theme addressed in this chapter is if diversity in Indian languages can be identified at various geographical scales. Linguistic diversity can be ascertained at the national level (macro level) or the state and district levels (micro levels) by examining the dissemination of languages at the national level relative to the spread or concentration of various languages across geographical units. This research specifically formulates and comprehends the linguistic diversity in India for the years 1971 and 2001 at the sub-national or state levels.
In this research, the census data for scheduled and nonscheduled languages have been utilized for the years 1971 and 2001. To conduct a detailed analysis, the mother tongue data has been selected related to languages as they cover the entire population. A mother tongue is defined as the language spoken in childhood by the person’s mother with the child. The Census of India provides linguistic records for scheduled and nonscheduled language data. This research has utilized all the 15 scheduled languages and combined all the nonscheduled languages as “other languages” and has aggregated them into one group. So, a total of 16 mother tongues have been considered for analysis during the year 1971 and 22 mother tongues for the year 2001. There have been many changes during the period 1971 and 2001 in terms of administrative divisions of states and inclusion of languages in the Constitution of India.
The study area consists of 28 states and 7 union territories for the period 2001 and 24 states and 6 union territories for the year 1971 for implementing the data analysis. This research has delineated the linguistic diversity of India by calculating the index of diversity modified by Dutt et al. (1985) for all the scheduled and nonscheduled languages of all states and union territories in India for the year 1971 and 2001.The index of diversity is used to explain the linguistic diversity pattern of India. An evaluation is also made to study the changes that have been observed during the selected study period, 1971–2001. The diversity index can be measured statistically by calculating an index of concentration or diversification. A diverse society with various religions, languages, and customs has a better prospect of national integration and social stability. Diversification has been studied by economic and urban geographers to explore the scale of concentration or spread in cities with respect to any economic or social characteristic such as religion, language, music, food consumption, industry, services, and high technology.
Timm (2018) computed the entropy measure of linguistic diversity across countries in the United States. He observed Queens County, New York, had the highest score, followed by Santa Clara which is home to Silicon Valley and Alameda, and San Mateo counties in the San Francisco Bay area have the highest linguistic diversity. Further, studies by Tress (1938), Rodgers (1957), and Conkling (1963) have analyzed patterns of diversification in cities according to the labor force absorbed within the industries in those cities. Shortridge (1976) refined the “Lorenz curve” method and calculated the religious diversity of population in the United States. Warf and Winsberg (2008) have utilized four quantitative measures (simple counts, relative dominance, Shannon’s and Simpson’s entropy, and probability-based estimates) to measure religious diversity in the United States. Dutt and Devgun (1982) used the revised index of diversity to measure religious diversity in India. This study was an extension of an earlier study to measure and interpret religious diversity patterns in Rajasthan (India) (Dutt and Devgun 1979).
This study has used the revised index of diversity by Dutt et al. (1985) for computing the linguistic diversity index of India. The values closer to +1.0 indicate less diversity, whereas values closer to −1.0 indicate high diversity. This study utilizes the index of diversity measure to generate maps showing the linguistic diversity ranging from very high to very low and calculates the diversity index for all the Indian states and union territories as well. The index of linguistic diversity measures the degree of social and linguistic interaction among various linguistic groups of the society.
I = Index of concentration (or diversity if lower values are considered as more diversified)
D = Cumulative percentage total according to the rank of each language in a state
E = Cumulative percentage total assuming even percentage distribution of each language
M = Maximum cumulative percentage total assuming 100 of the frequencies in Rank 1
“D” is the cumulative percentage total, according to the rank of each language in a state. The term “E” in the formula is calculated as the cumulative percentage total assuming even percentage distribution of each language. For the year 1971, “E” was derived for 16 (15 scheduled +1 nonscheduled) languages assuming an even percentage, i.e., 100/16 = 6.25, for each language. The final term “E” was calculated as 850 for 1971. Further, for the year 2001, 23 languages were utilized in the study (22 scheduled +1 nonscheduled), and assuming an even percentage, i.e., 100/23 = 4.34, the final E was calculated to be equivalent to 1198.19. The term “M” in the formula is the maximum cumulative percentage total assuming 100 of the frequencies in Rank 1 and was calculated as 1600 (16*100 = 1600, i.e., 16, were the total languages in the data set for the year 1971). Similarly, the term M for 2001 was equivalent to 2300 assuming 100 of the frequencies in Rank 1 (23 total languages were taken for the year 2001), and, finally, (d) the last step involves mapping the values for each state and union territory by using ARC-GIS 10.5 and classifying the values by using quantile method and grouping it into five categories, i.e., “very high,” “high,” “moderate,” “low,” and “very low” diversity. The values closer to 1 are the least diverse and values closer to 0 are highly diverse. So, the diversity maps have been prepared for the time periods 1971 and 2001. These maps illustrate the concentration and the spread in the geographical pattern of the languages of all the states and union territories of India.
Linguistic Diversity in India
The major linguistic regions of India have played a vital role in maintaining India’s political, social, and cultural identities. These states are not just administrative units, but they symbolize the deep historical and cultural histories from which it has evolved to its present-day landscapes (Adhikari and Kumar 2007). There are regional linguistic issues in India due to which segments of the population demand for political division and fragmentation of states. All aspects of linguistic diversity cannot be addressed in this chapter, and hence the focus of this chapter is limited to examining the spatial linguistic diversity of India. For instance, imagine the core regions where people use the language as mother tongue; then around the core exists the linguistic frontier where the predominant presence of the language begins to decline in a distance decay fashion. Also, languages such as Marathi, Bengali, and Tamil retain the core region effect in the cities such as Mumbai, Kolkata, and Chennai and spread to other places via migration of people and relocation diffusion. Likewise, Punjabi and Sindhi spread via migration of people and relocation diffusion as well (Kalra 2007).
India’s linguistic regions include languages belonging to more than one family and sharing traits which do not belong to the original families (Emeneau 1956). Khubchandani (1993) argues that India is an example of contrived homogeneity since small speech groups consisting of several thousand or more people are able to survive in the midst of large linguistic communities. A recent study by the United Nations Educational, Scientific and Cultural Organization (UNESCO) has examined several languages in the world which are at the verge of annihilation. It has identified 196 languages in India belonging to the “scheduled,” “nonscheduled,” and “other” categories which have been classified as “endangered.” Of these threatened languages for possible extinction, two of them (Manipuri and Bodo) belong to the Eighth Schedule categorization and have been classified as being “endangered” and “unsafe” (UNESCO 2010).
Diversity Analysis of Indian Languages: 1971–2001
Linguistic diversity is a direct and an indirect gauge for social, political, and economic expansion, and it exists where substantial numbers of people speak more than one language within a particular region. Therefore, a higher number of spoken languages results in high linguistic diversity and a lower number of languages spoken in a region results in a lower diversity (Kalra 2003).
The diversity index helps to depict the concentration and the spatial pattern of languages in all the states and union territories of India. In India the distribution pattern of languages has changed during the period 1971 and 2001. This is related to the many economic, demographic, social, and political factors that have brought transformation in India over the past three decades. Many new states have been carved out and new languages have been added since 1971. These factors have also played a vital role in changing the linguistic diversity in India. Some states have developed economically and also had social and political upheavals.
Three newly formed states, i.e., Jharkhand, Uttaranchal, and Chhattisgarh, were not existent in 1971.
Jammu and Kashmir had high linguistic diversity in 1971 and was a very sought-after destination due to its geographical landscape, natural beauty, and wilderness (the mighty Himalayas); but in 2001 it changed to very low diversity because of increased political instability and the continued Kashmir dispute between India and Pakistan.
The state of Karnataka (then Mysore) had high diversity in 1971. It was called a “garden city” and a “pensioner’s paradise.” Karnataka’s capital city Bengaluru’s industrial structure has been very diverse ranging from textiles, heavy machinery, electric, and aeronautics to information technology-based services. After independence in 1947, many new schemes were launched by the central and state governments, and it was acknowledged as the science and technology capital of India. The major reason for the emergence of Bengaluru as a center of manufacturing after independence was to locate strategically susceptible industries like defense and electronics away from borders and coastlands (Chittranjan 2005; Nair 2005; Dittrich 2007). The location of the Indian Institute of Science along with increasing investments by the central government in R&D establishments, such as the Indian Space Research Organisation (ISRO), provided the city the acronym as the “science city” (Heitzman 2001). All these factors attracted talent and skilled workers from all over the country to Karnataka. Especially large number of migrants originated from the neighboring states like Tamil Nadu, Andhra Pradesh, and Kerala suggesting a “distance decay effect” (Tobler’s law, i.e., more migrants originated from states which are in close proximity relative to states that were further apart). Chandigarh also had high linguistic diversity (0.47) in 1971 as it was a newly formed city after India’s independence (1947), and it became the capital of both the states of Punjab and Haryana.
Chandigarh was also called the dream city by India’s first Prime Minister (late) Shri J.L Nehru. It was one of the post-independence cities developed and planned by the famous French architect Le Corbusier. It was also considered to be one of the best experiments in urban planning and modern architecture of India. Thus, Chandigarh being the capital of two states (Punjab and Haryana), 56% of the population spoke Hindi and approximately 40% spoke Punjabi, and the rest spoke other scheduled languages.
In the year 1960 due to linguistic disputes and with the formation of the States Reorganization Commission, the then state of Bombay (today Maharashtra’s capital) was divided into two new states of Gujarat (Gujarati speakers) and Maharashtra (Marathi speakers). Historically, Maharashtra happens to enjoy the advantage of the presence of two languages, Marathi and Gujarati. The state of Maharashtra shows moderate linguistic diversity (0.70).
Delhi, the capital city, had a high diversity index of 0.50 in the year 2001. It is dominated by Hindi speakers followed by Punjabi and Urdu speakers. It attracts migrants from nearby states due to being the capital city and an urbanizing and fast-growing metropolis. The states surrounding Delhi is a Hindi-speaking belt, and hence one would expect people from neighboring states to flock to the capital city for jobs, better quality of life, and educational opportunities. Recently, Delhi has become the technology hub and, thus, attracts skilled workers from all parts of the country to reside and work in the nation’s capital. The states of Maharashtra and Andhra Pradesh also show high to moderate diversity. Mumbai (Bombay) is both the capital of Maharashtra and is the largest metropolis of India. It is considered as India’s economic capital, financial, commercial, cultural, and educational center that attracts migrants from all over the country. India’s Hindi film industry, called Bollywood, is located in the city and is the epitome of entertainment industry, thereby playing a very influential role in unifying India. Hyderabad, capital of Andhra Pradesh, also called Cyberabad is called the “biotech capital” of the country. The city attracts large number of skilled workers, scientists, and engineers from the rest of the nation. The linguistic diversity index has been calculated for the years 1971 and 2001, but the results cannot be directly compared due to differences in the number of states and languages utilized in both years for the calculation of the diversity index. An interesting result observed for Delhi, Pondicherry, and Dadra and Nagar Haveli is that there has been an increase in their linguistic diversity (see Fig. 2). Delhi is the capital city and a technology hub attracting people from all over the country with different mother tongues. The union territory of Pondicherry (Puducherry) is also administratively diverse as it comprises of four coastal regions, viz., Puducherry, Karaikal, Mahe, and Yanam. Puducherry and Karaikal are situated on the East Coast in Tamil Nadu, Yanam in Andhra Pradesh, and Mahe on the West Coast in Kerala. Pondicherry (Puducherry) is on the East Coast about 162 km south of Chennai located on the Coromandel Coast of the Bay of Bengal. It also has a rich history as the French first set its base in 1670, followed by Portuguese, Dutch, and English settlers. Many languages are dominant in this region such as Tamil, Telugu, Malayalam, English, and French. These languages are spoken by a substantial number of people residing in this region.
The linguistic diversity in India portrays the region’s history, culture, and socioeconomic processes of transformation. It portrays that the diffusion of languages across geographical space is intertwined with the complex histories of the region. This research delineates the geographical pattern of linguistic diversity in India during the periods 1971 and 2001 utilizing index of diversity approach at the state level. It sheds light to the observation that India is linguistically diverse at the national level (macro scale) but homogeneous at the state level (micro level). Almost all the states are dominant in one major language with few secondary languages spoken by some segments of the population. The results show remarkable change in few states and lesser changes in other states. The transition during the period 1971 and 2001 increased the linguistic diversity in few states and decreased in some, and in some states, there was no change at all.
The analysis has demonstrated that highly developed (economically) states such as Maharashtra, Delhi, and Tamil Nadu tend to have high linguistic diversity, and impoverished states tend to have lesser linguistic diversity. It reveals that dominant languages of bordering states have an impact upon the states which are economically developed and highly urbanized toward revealing its higher linguistic diversity. If the neighboring state uses similar or same language, the probability of linguistic diversity will be quite low and vice versa. An important observation is that apart from proximity and distance, the homogeneity of the languages depends on a region’s social and political stability. Stability in a region plays a significant role in attracting people of diverse languages as in the case of major metropolises such as Delhi, Mumbai, and Bengaluru.
Several nonscheduled languages are struggling to be included among scheduled languages to alleviate its importance in the linguistic hierarchy: economically, socially, and politically. Many nonscheduled languages have been included as part of scheduled languages in the past. During 2008 there were 22 scheduled and 100 nonscheduled languages (Census of India 2001).
During the period 1971 and 2001, India has become linguistically diverse only at the national level but not at the state level. A caveat of this study is that before attempting a comparison of the two time periods, data limitation has to be taken into cognizance.
Scale is important in geographical research. A relationship or association that is true at the national level might not be true at finer geographical scales such as state, district and cities, and small towns. This is often referred to as the modifiable areal unit problem (MAUP) in geographical research. As the data points are disaggregated, the causal relationship and associateship can alter. Therefore, further research needs to be conducted along many lines: (1) using district level or metropolitan cities level data to explore and reveal the spread of diversity or the lack of it at different geographical scales; (2) alternative methodologies such as entropy, probability-based measures, and spatial statistics can be employed to analyze the impact of neighborhood and cluster effects on the diversity patterns of languages; and (3) the Fishman-Pool hypothesis needs to be evaluated in geographical space for a deeper understanding among the relationship between linguistic diversity and economic development at regional scales. Such an analysis will enrich the understanding of linguistic diversity patterns in India at various geographical levels.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- Adhikari, S., & Kumar, R. (2007). Linguistic regionalism and the social construction of India’s political space. In B. Thakur, G. Pomeroy, C. Cusack, & S. K. Thakur (Eds.), City society and planning: Essays in honor of Professor A.K. Dutt, Volume 2: Society (pp. 374–392). New Delhi: Concept Publishing Company.Google Scholar
- Brass, P. (1991). Ethnicity and nationalism: Theory and comparison. Thousand Oaks: Sage.Google Scholar
- Census of India. (1971). http://www.censusindia.gov.in. Accessed 30 May 2003.
- Census of India. (2001). Data on language. Statement 7. http://www.censusindia.gov.in/Census_Data_2001/Census_Data_Online/Language/Statement7.htm. Accessed 30 June 2003.
- Chittranjan, H. (2005). A handbook of Karnataka, Government of Karnataka. Bangalore: A Government of Karnataka Publication.Google Scholar
- Dittrich, C. (2007). Bangalore: Globalization and fragmentation in India’s high-tech capital. ASIEN, 103(S), 45–58.Google Scholar
- Dutt, A. K., & Devgun, S. (1979). Religious pattern of India with a factorial regionalization. GeoJournal, 3(2), 201–214.Google Scholar
- Dutt, A. K., & Devgun, S. (1982). Patterns of religious diversity. In A. G. Noble & A. K. Dutt (Eds.), India: Cultural patterns and processes (pp. 221–246). Boulder: Westview Press.Google Scholar
- Fishman, J. (1968). Some contrasts between linguistically homogeneous and linguistically heterogeneous polities. In J. Fishman, C. Ferguson, & J. Das Gupta (Eds.), Language problems of developing nations. New York: Wiley.Google Scholar
- Geertz, C. (1973). The interpretation of cultures. New York: Basic Books.Google Scholar
- Heitzman, J. (2001). Becoming a Silicon Valley: Bangalore as a milieu of innovation. Seminar, 503(July), 299–330.Google Scholar
- Kalra, R. (2003). Linguistic diversity changes in India: 1971–1991. M.A. thesis, Department of Geography and Planning, University of Akron, Akron.Google Scholar
- Kalra, R. (2007). Indian languages and their dissemination: 1971–1991. In B. Thakur, G. Pomeroy, C. Cusack, & S. K. Thakur (Eds.), City society and planning, Volume 2: Society (pp. 447–477). New Delhi: Concept Publishing Company.Google Scholar
- Khubchandani, L. M. (1993). India as a socio-linguistic area. In A. Ahmad (Ed.), Social structure and regional development: A social geography perspective. Jaipur: Rawat Publications.Google Scholar
- Nair, J. (2005). The promise of the metropolis: Bangalore’s twentieth century. New Delhi: Oxford University Press.Google Scholar
- Nigam, R. C. (1972). Language handbook on mother tongues in census. Calcutta: Language Division.Google Scholar
- Noble, A. G., & Dutt, A. K. (Eds.). (1982). India: Cultural patterns and processes. Boulder: Westview Press.Google Scholar
- Pool, J. (1972). National development and language diversity. In J. Fishman (Ed.), Advances in the sociology of language (Vol. 2, pp. 86–99). The Hague: Mouton.Google Scholar
- Sekhar, C. A. (1971). Social and cultural tables (Census of India 1971, series 1, India Part II-C i). New Delhi: Office of Registrar General India.Google Scholar
- Sengupta, P. (2009). Endangered languages: Some concerns. Economic and Political Weekly, 44(32), 17–19.Google Scholar
- Sonntag, S. (2017). Languages, regional conflicts, and economic development in South Asia. In V. Ginsburg & S. Weber (Eds.), The Palgrave handbook of economics and language (pp. 489–508). Basingstoke: Palgrave-Macmillan.Google Scholar
- Srinivas, A. (2018). Hindi’s migrating footprint: How India’s linguistic landscape is changing. https://www.hindustantimes.com/india-news/hindi-s-migrating-footprint-how-india-s-linguistic-landscape-is-changing/story-ssstgK9b2xR9x4srulu6OJ.htmls. Accessed 17 Oct 2018.
- Timm, J. (2018). Locating linguistic diversity in the USA. https://www.jtimm.net/2018/02/10/locating-linguistic-diversity-in-the-usa/. Accessed 22 Sept 2018.
- UNESCO (United Nations Educational, Scientific and Cultural Organization). (2010). Atlas of the world’s languages in danger. Paris: United Nations Educational, Scientific and Cultural Organization; 3 Revised Edition.Google Scholar