Introduction

Indian population is increasing rapidly and proper education for this large population is a critical challenge for the Government of India (GoI). According to the Right to Education (RTE) Act, education is a fundamental right for children between 6 and 14 years. The Geographic Information System (GIS) based school mapping is required at the village level to apprehend the academic and infrastructure facilities status and related problems in the schools. Such solutions will be especially beneficial in the effective implementation and monitoring of the National Education Policy (NEP) 2020. To improve the quality of primary and secondary education, the GoI has taken several initiatives like the establishment of new schools, recruitment of teachers, improvement in infrastructure facilities, etc. As compared with 1950–1951 to 2010–2011, it is found that the primary schools have increased by 3.5 times while upper primary schools have grown by 33 times (Agrawal & Gupta, 2020).

Currently, more than 80% of data has a spatial component (Klinkenberg, 2003). In the digital century, the use of GIS to handle spatial data is increasing (Agrawal et al., 2022). GIS can be defined as “a decision support system involving the integration of spatially referenced data in a problem solving environment” (Cowen, 1988). GIS has a lot of utility for educational administrators. It provides analysis that is useful in planning and monitoring like visualization of data, identification of pattern, school catchment area delineation, etc. (Langley, 2001). GIS has been used in various application fields. For example, it can be used in health for the establishment of new facilities which is a muli-criteria decision making problem (Tripathi et al., 2021). Mishra et al. (2019) have given a detailed methodology for the opening of health care units in rural India using spatial weighting and muli-criteria decision making. This work is extended by giving the facility centers in order to maximize accessibility and reduce the travel cost for individuals with multiple trip purposes (Mishra & Sahu, 2021). Activity center with facilities of health-care center, high school, bank and market is further given using location allocation problem for infrastructure development (Mishra et al., 2021).

GIS in education involves school mapping, where an integrated geographic database of schools is built by including the school location data, socio-economic data and demographic data (Agrawal & Gupta, 2016; Mulaku & Nyadimo, 2011). Spatial queries can be performed on this database that involves the operations like overlay, buffer, etc. GIS can be used to develop a model that determines the optimal path for transportation (Ghose et al., 2006). Students and school vehicle drivers can use it as a decision support tool to find the optimal path from the source to the destination (Singh et al., 2019). Using the multiple parameters based optimal path, the fastest route for a student to commute between home and school can be determined. The bus drivers can also follow this optimal path to pick students using the optimal path (Pettit et al., 2018).

Transportation is an essential aspect of child education. The journey from home to school is a significant part of the school experience which is not just a means of access to education but also has consequences on attending the school (Gristy & Johnson, 2018). Most of the time, there are several paths to reach the school. The selection of an optimal path for transportation is one of the important issue for parents to send their children to school. The cost involved in transportation is a concern of the lower and middle-income group parents. Sometimes, it hampers child education, resulting in absenteeism and school dropout (Aggarwal, 2018). The route selected for the journey is not always the shortest, but the choice is influenced by several factors like residential density, traffic exposure, traffic lights, route directness, slope, etc. (Ikeda et al., 2018). In school commutation route selection, there are some additional factors like mode of travel, vehicle ownership, parent’s workplace location, pupil age and gender, etc. GIS can provide solutions that can help to reduce these problems.

Educating a large population needs support. Opening new schools are not the only solution. There must be proper monitoring of schools so that possible problems can be identified. The taxpayers' money usually funds these schools. A particular quality level is expected from these schools. Several efforts are made in this direction, but a GIS based study is required to provide the exact location where there is some problem. It will help to reveal the hidden pattern in the data set which is otherwise not visible. The following research gaps are identified through literature review: (i) Studies focused on spatial analysis of schools based on attributes are very limited. It is necessary to recognize the geographical context in which the problems related to schools are occurring to get in-depth insight. There is a need to address many of the school's related problems by spatial analysis of enrollment, gender ratio, teacher taught ratio etc. (ii) There is a need to find the optimal school travel routes based on different parameters. A little attention has been given to providing the visit sequence using GIS techniques like Travelling Salesman Problem (TSP). An effective evaluation of school visits can help the official in visiting all schools. There is a need to improve school accessibility, reduce school travel costs and cater to officials' need to visit all the schools. (iii) The literature on the use of GIS in education (as given in Sect. 2) shows that a sizeable amount of studies have been conducted in western countries. Authors came across very limited studies in the Indian context that has examined the role of GIS in school analysis and transportation routes. This study is an attempt to fill these research gaps.

The aim of this paper is the evaluation of primary schools and their accessibility using GIS techniques. In order to achieve this aim following objectives are formulated: 1) to create a geospatial database of schools, their categories and related infrastructure; 2) to perform spatial analysis on the school data under the GIS environment; and 3) to study the road connectivity of schools and find the optimal path.

Literature review

This section explores the latest GIS research carried out for schools. This review is carried out in two parts. The first covers the role of GIS in the education sector. It starts with the role of GIS in schools that includes site selection, redistricting, distribution of schools, assessment of school surroundings and evaluation of student health. The second part is about the use of GIS in school commutation. In this section, GIS studies on school commutation are surveyed. It involves the transport mode selection, route analysis and en-route safety.

Role of GIS in schools

Site selection, redistricting and distribution of schools

The opening of a new school is necessary for the quest for academic growth. This will boost the easy accessibility of schools. Elementary education is mandatory in many countries. In India, the RTE Act was made to universalize education in children. This needs a large number of schools. The accessibility factor must synergize the opening of new schools. Often, the government gives specific rules for opening new schools based on distance, population density, etc. These rules act as a nucleus for school site selection decisions. GIS can help to do this type of meticulous planning.

Al-Sabbagh (2020) used the location allocation model to open new schools in the Mansura city of Egypt to improve accessibility. Several parameters are considered for this: existing schools' location, residential area, demand point and transportation network. The study area is divided into a grid of 100 × 100 m square cells. Among these cells, only those cells are selected which are meeting the criteria while others were discarded. Finally, by using location allocation models and statistical analysis, the sites are chosen for the schools. Another study on the preschool site selection was carried out by Köse et al. (2021). It is done in a province of Turkey. It studies the distribution of existing schools to assess the requirement of new schools. They used existing preschools, population, road and Land Use Land Cover (LULC) data to find the location of potential preschools.

Related to this is the redistricting of schools where the school district's boundary is revised so that students get admitted to the nearest school. This is particularly useful in those countries where admission is given based on distance and place of residence. In literature, different parameters are described for school redistricting like travel distance, school capacity, socio-economic factors, ethnic composition of population, terrain relief and boundary contiguity (Lemberg & Church, 2000). Gerrymandering should be avoided while defining such zones (Siegel-Hawley, 2013). Caro et al. (2004) describe the GIS-based optimization technique for school redistricting. Objective functions are defined by considering the walking distance, school capacity, district contiguity, etc. An interactive interface was developed through which users can select the parameters and define their threshold. The optimization routine provides the school redistricting solution which can be modified as per user discretion.

The proper distribution of schools is required for improving the literacy rate. One such study was carried out by Tanveer et al. (2020) for Abbottabad city of Pakistan. The authors used spatial and statistical approaches to find the mismatch in school distribution. The direction ellipses are used to indicate the school's spread and their direction. Spearman's Rho and Ripley’s K functions are used to correlate the number of students with the school distance. The Z-score test evaluates the dispersion and clustering of scores. Finally, LULC classification is used to link the school's distribution with the number of buildings. Such a type of analysis can help in the proper site selection of schools.

The accessibility and efficiency of schools are essential for the intensive growth of education. The spatial distribution of schools and the feasibility of opening new schools can be handled by GIS. Several researchers have been carried out that have taken into consideration different factors. Yang et al. (2017) found that the regions with mountain terrain lag more than those places located on planes. This paper has considered road slope, walking speed and service area to find where new schools can get established.

Assessment of the school surrounding

The surroundings of the schools made a substantial impact on the pupils. The rejuvenation and greening improve the environment of that place. School campuses are often well planted locations covered with turf grasses that create the Oasis effect (Taha et al., 1997). It helps to reduce the impact of urban heat islands (Dutta et al., 2021). The LULC of the schools can be studied at the micro scale (schoolyard) or macro scale (neighborhood of schools). Schulman and Peters (2008) gave a comprehensive study on the LULC of the schoolyard. They studied at 258 schools located in three different cities of United States (US). Aerial photographs were used from which school building footprints got delineated through digitization and LULC were extracted by classification. ESRI ArcView software was used for these tasks and SPSS software was used for statistical calculations. Spearman's Rho correlation analysis gave a negative correlation between impervious surface and schoolyard size while positive between vegetation and schoolyard size.

Similarly, Agrawal and Gupta (2017) study LULC in the nearby region (500 m) of the schools. Total 204 schools in India were taken for this study. The image processing and GIS techniques were used to assess LULC in the surrounding regions of the school. The results helped to determine the environment, hazard and transportation related issues around the schools. Another study is carried out about the inside air quality of the schools. Sivri et al. (2020) shown concentration of culturable airborne bacteria and culturable airborne fungi in the school indoors. In this work, GIS is used as a demonstration tool to depict the air quality inside the school. A detailed 2D floor plan of the school is prepared that is imported into the GIS environment as vector data. Thematic layers are generated to show the distribution of indoor air pollutants in schools. The spatio-temporal relationship between pollutants and atmospheric conditions is also depicted through visual analysis.

Apart from the outdoor space, GIS is also used in the indoor space design of schools. The indoor space management in education institutes has become critical in the present situation of COVID 19 pandemic. Li et al. (2021) used GIS for the library floor management of the Purdue University Libraries in the US. In this research, the student activity data is collected through a web interface using mobile devices. They combined this data with the digitized classrooms, study rooms, laboratory, floor plan, etc. The color coded library floor plan is generated based on the number of students, number of groups, technology used etc. This can help in the proper management and allocation of resources. Waktola (2015) gave spatial dynamics of attributes inside the classroom. In the article, the seating position, proximity to instructor, distance from projector, etc. are considered. Their impacts on test scores and attendance are also demonstrated. Overall, GIS can be used to give the optimal location of anything inside the school like Wi-Fi (Ebner et al., 2017) or any other resource (Godfrey & Stoddart, 2018).

The selection of a school is a big challenge for parents as it affects the professional accomplishment of the pupils. It is a multi criteria decision making problem that can be supported by GIS (Yoon et al., 2020). Baykasoglu and Durmusoglu (2014) used an analytical network process and fuzzy cognitive maps to select the private primary schools. They identified nine criteria for the selection of a school. These criteria are education quality, popularity, technology, school activities, foreign language, physical facilities, security, health and transportation service. Each criterion has 3 to 10 sub criteria. Fuzzy cognitive maps and the DEMATEL tool are used for the evaluation of interrelationships among criteria. Analytical network process is then used for weight assignment and assessment of alternatives. Zhang and Ruther (2021) have studied ethnic and racial bias in school selection. They have suggested the spatial regression and geographically weighted regression approach for understanding the spatial disparities in selecting private schools over public schools by certain ethnic groups.

GIS studies on school commutation

School travel has a significant share in the traffic of most of the cities. It is one of the critical components in child education. Actual data is required to study it. In the past, hand drawn maps were used to show the actual path followed to reach a destination. After the advent of the Global Navigation Satellite System (GNSS), especially GPS, the concept of participatory mapping has become popular. In several studies, participatory mapping is used to gather the children's perception, interpretation, attitude, action and experience regarding their route on the home to school trip (Wilson et al., 2019).

School travel involves several points like mode of transport, route selection, security, etc. (Larouche, 2015; Lee et al., 2013). Cognitive studies have been performed to understand travel related habits and decisions (Boussauw et al., 2014; Mehdizadeh et al., 2019). GIS has been used in such studies which are discussed in detail in the following sub sections.

Selection of mode of transport

A child can go to school by walking, by cycling, in a household motorized vehicle or through public transport. Active transport modes, i.e., walking and cycling, grant the benefit of regular exercises. Passive transport modes, i.e., the use of motorized vehicles, are fast and time saving. Selection of mode of travel for school is a complex decision making problem that involves social, economic and geographical parameters (Henne et al., 2014).

Active transport is considered a regular way of exercise in children that helps to prevent diseases like obesity, diabetes, etc. It makes a child more self dependent and confident. Stewart (2011) has done a literature review on the factors associated with the form of transport. In this review article, 480 variables have been identified that play a role in selecting transportation mode. Among these, urban form variables are determined through GIS. These variables are the features of a built-in or physical environment where a child travels to the school. In this, GIS is used to record the variables related to distance, proximity, road network, sidewalk, crossing, path directness, topography, LULC, walkability and urbanization. Another study was conducted Wati and Tranter (2015), in which they studied spatial and social demographic factors that have affected the choice of cycling as the mode of transportation. They have reviewed 380 primary and 109 secondary schools of Queensland, Australia. The study revealed the correlation between the choice of school and some variables like parents' employment location, type of house ownership (rented or purchased), number of vehicles in house, school's rank, budget, catchment area, etc. The travel pattern of students and the transport mode choice is also modeled. Carver et al. (2019) studied the factors that lead to walking as the mode of transportation. They identify and define descriptive data related to children, their socio-economic condition and the built-in environment of travel route. Subsequently, a multilevel multivariable Poisson regression model is applied to these factors to predict walking duration. Several other GIS based studies are conducted to understand the factors that lead to selecting a particular mode of transport (Grasser et al., 2016; McDonald, 2007; Mehdizadeh et al., 2017; Rojas Lopez & Wong, 2017; Villa-González et al., 2018).

Several other spatial studies were conducted on the mode of transport for commuting between home and schools. Zavareh et al. (2020) have given IT solutions to inform the parents regarding the entry and exit of children in school using exploratory factor analysis, multivariate analysis of covariance, regression analysis and other statistical procedures. Deka (2013) gives the interrelation between parents' trips to the workplace and children's mode of transportation to school using the Heckman model. He and Giuliano (2017) enlisted the factors that show the possibility that parents will escort the children to school using the multinomial logistic model.

Vanwolleghem et al. (2016) illustrated the effect of transition from primary to secondary school on switching the mode of transportation by carrying out the survey through questionnaires and performing the statistical analysis on collected data. Fang and Lin (2017) showed the influence of school transportation mode on the spatial cognition of a child, like the memory of a number of landmarks, paths, orientation and structure of school routes by defining the explanatory variables and using the statistical procedures. Ikeda et al. (2019) have done the participatory mapping of school routes. They have done the structural equation modeling on 50 variables to associate them with active transport.

Route analysis

The route analysis involves two major aspects: (1) The distance between the home and school and (2) The trajectory followed for this trip. Most of the research done in this field follows a generic approach which is summarized in Table 1. Firstly, geocoding of home and school addresses is carried out. In geocoding, the coordinates of a place are estimated using the address. This may take the help of local inventory or Google Map directory. A more accurate alternative to geocoding is GPS based address determination. Once the address locations are obtained, the creation of datasets is performed. The road network is a mandatory dataset for this. It is prepared by digitization, OpenStreetMap (OSM), Topologically Integrated Geographic Encoding and Referencing (TIGER) data, etc. It can be enriched by including the information of pedestrian lanes, highways, railways, crossings, etc. Afterward, network analysis is performed. It is one of the powerful features of GIS through which the path, its cost, time and delivery are ascertained between the corresponding locations. Based on the definition of the best route, this can identify the shortest path, quickest path or least cost path. The shortest path for the home to school is computed through the Dijkstra algorithm. Subsequently, alternative routes are generated based on factors like least populated path, least polluted path, minimum crowded path, etc. GIS and cartography analysis can then be carried out like mapping of home to school route, their distribution, school catchment or service area, excess of schools, excess of people, etc.

Table 1 Tools, technologies and devices used in the route analysis

The travel behavior, mobility pattern and route selection for a home to school trip depend on several factors. Badland et al. (2010) compare the results of GPS measured actual commute path with that estimated through GIS. In this work, the addresses of the participants are geocoded. The ArcView’s closest facility tool of network analyst extension is used to find the shortest route from home to school. Some variables, namely residential density, LULC and street connectivity, are considered around the buffer zone of the actual and estimated path to find the agreement between them. Another more sophisticated study was carried out for Toronto, Canada where the shortest path to school is compared with the lower ultrafine particle route (Elford & Adams, 2019). Here, a road network is created and additional data of sidewalk is included for pedestrian travel with the help of OSM and local data inventory. Along with this, LULC, DEM, participant addresses and school boundaries are included. The use of barriers like highways and railways has made the results more accurate. The ArcGIS network analysis extension is used to generate the routes. The R scripts are used for the inclusion of ventilation rate, speed of walking, intake of ultra fine particles and energy expenditure.

Ikeda et al. (2018) collected the child hand-drawn actual route of school commuting through an online participatory GIS tool and compared it with the GIS estimated the shortest route. SoftGIS, an emerging online participatory software, is used to obtain the actual route. GIS estimated shortest route is computed by ArcGIS software. The overlapping of both routes is analyzed statistically on seven spatial overlapping parameters that lie inside the 80 m buffer around actual and estimated routes. Dessing et al. (2016) have discussed the methodological issues in past studies. This includes accuracy issues in geocoding; scale or zoning issues in the buffer that are built to study the surrounding environment of the schools, homes or routes; disagreement in the GIS derived shortest path from the actual path; missing or inaccurate details related to pedestrian name barrier, etc. In this study, children wear the GPS receiver through which the actual path is collected. The road network is obtained from the digitization of topographic maps which is corrected by OSM and satellite images. They have taken several built-in variables under four broad categories, namely LULC, aesthetics, traffic and street type. A comparison is made between the actual route and the derived shortest route based on these parameters. This analysis is done separately for walking routes and cycling routes. It is identified that both pedestrians and cyclists showed a preference for residential roads and avoidance for paths that have more accidents and zebra crossings.

The use of bus service is also prevalent to pick and drop the pupils. Mohandas et al. (2021) have conducted a study on the vehicle routing problem for school buses. They proposed the bus schedule, number of buses, best location of stops and optimum route for buses. The web GIS interface is developed to show how bus location information can be circulated to the children and their parents. Casas et al. (2008) have done the efficiency assessment of the bus routes. They use data envelopment analysis based on linear programming techniques. They have developed a spatial decision support system for bus routing. They have used the redistricting toolbox in ArcGIS. They provided the optimized location of bus stops, assigned students to these bus stops and solved the routing problem using the TSP.

Some of the studies have been conducted in the Indian context (Tet al.,i et al., 2015). Singh and Vasudevan (2018) have listed the parameters that are involved in school transport selection as per the Indian scenario that includes peer influence, age, gender, parent's employment, income, vehicle ownership, etc. They found that walking and bicycling are the major transportation modes for the low-income group students. This mode is preferred when the school is in close proximity. Manoj et al. (2020) also carried out a study on the school travel time of Bengaluru city of India. They have given a detailed list of variables that affect the school trip. The stochastic production frontier approach is used in this study to provide a correlation coefficient of different parameters with travel time. Both the previous studies mentioned the gender effect on the school trip commutation.

Assessment of en-route safety and security

Child safety and security during the commutation between the home and school is a primary concern of parents. There are two time windows where special attention on traffic is desired. One is the morning or starting time of school and the second is the evening or closing time. At these times, school management, local administration and police face problems. Heavy vehicles must be avoided around the school, particularly at the opening and closing times. The traffic flow must be regulated to avoid traffic conditions and jams at the entry and exit time. Vehicle speed must be controlled and traffic should be calmed down at this time. GIS based studies have been carried out in this direction.

Briz-Redón et al. (2019) estimated the accidents in the proximity of schools in Valencia, Spain. The authors prepared the data set of accidents, schools and defined the covariates for traffic accidents. The R programming language and its packages are used to perform the study. The Monte Carlo simulation is used to get the observed vs expected ratio of accidents. To analyze the result at a macroscopic level, hexagonal units were drawn around the school. The hexagonal unit is preferred over the circular buffer in order to reduce the Modifiable Areal Unit Problem (MAUP). The conditional autoregression, multivariate regression and logistic regression are used to analyze the accidents inside the hexagon units around the schools.

Heydari et al. (2020) studied the safety of transportation facilities around the schools as compared to other places in Montreal, Canada. Different types of data were collected in the proximity of schools, such as number of pedestrians, number of pedestrian injuries, traffic volume, bus stops, LULC, etc. They developed the Bayesian heterogeneous endogenous econometric framework to determine the effect of school proximity on pedestrian safety.

Clifton and Kreamer-Fults (2007) have used the ordinary least square method to estimate pedestrian crashes. The circular buffers are created around the schools. The dependent variables related to school, LULC, road, vehicle, etc., are defined and then used to estimate the crash severity. Corazza et al. (2020) studied sidewalk safety for pedestrians on home to school trips. The authors have given a detailed checklist of wellbeing, safety, security, usage and appearance. This helps to compute the sidewalk condition index that is used to find the approaching trajectories to schools. Chevalier and Charlemagne (2020) focused on the pupil's safety while crossing the road. In the paper, detailed route analysis is carried out to pinpoint the location of the danger. The infrastructure design solutions are proposed to reduce the perceived danger.

In 2005, the US started the Safe Routes to School (SRTS) program to provide more safety and reduce injury in school going children. Some GIS based research is carried out for this program. Boarnet et al. (2005) have done the cross-section evaluation of roads in California to ensure the safety of walking or cycling pupils. DiMaggio and Li (2013) also carried out the study for SRTS but for motorized accidents. They used ArcGIS and R programming to build the database and perform statistical analysis.

Several other road and traffic safety studies are conducted. They have different perspectives like traffic safety concerns of parents and gender based route selection (Guliani et al., 2015; Oluyomi et al., 2014); assign walkability to safe routes using building index model and GIS (Kim et al., 2016); and role of school location and nearby environment on child safety (Yu & Zhu, 2016).

Apart from traffic, road and sidewalk safety, various studies consider other aspects. Milam et al. (2014) have studied the association between the alcohol outlet on the way to school and route safety perception. They also investigated the exposure of alcohol, drugs, tobacco and other harmful things on a child in the presence of such outlets on school route. Rossen et al. (2013) demonstrated the effect of en-route food outlets on the child's health with the help of ArcGIS and SATA. Stewart et al. (2020) have done a proximity based analysis of the roadway emission and its effect on the air quality of schools. Downs (2014) has mapped the criminals and offenders. Then potential path are generated by using geocoding, network analysis and Voronoi diagram. This has provided a spatial context to ensure the safety of students. These studies have shown that the GIS is used in unique and innovative ways in ensuring the safety of school going children.

Study area

The Uttar Pradesh state of India is divided into 75 districts. The literacy rate of Uttar Pradesh is comparatively lower than other states. Compared to the national average, the poverty ratio is higher here while the educational index is lower (Tilak, 2018). In Uttar Pradesh, the aggregate literacy rate is 69.7%, while for the children in the 10–14 age group, it is 87.7% (Shukla & Mishra, 2017). Over the time large number schools have been established here. In 2018 it has the highest number of schools (2,43,014 schools) in the country.

In the Prayagraj district of Uttar Pradesh state, literacy rate is 72.32% as per the census 2011. The female literacy rate is just 60.97% in this district. In the Prayagraj district the age group (0–6) literacy rate is 72.32% while male children literacy rate is about 70.23% and female children literacy rate is just 51.95%. There are 6,337 schools in this district. Although the education status has improved over the years in the state, more work and support are still desired in this direction. Therefore, this particular region is selected as the study area of this research work.

The geographic extent of the Prayagraj district is shown in Fig. 1. The total area of the Prayagraj district is 5,482 km2. According to the Census of India (2011), the population of Prayagraj district is 5,954,391. The population density is about 1090 people per km2 in the district. Prayagraj district is divided into eight tehsils which are further divided into 20 blocks. There are 3077 revenue villages, which is the central part of this research work.

Fig. 1
figure 1

Study area location map

Methodology

The primary purpose of this research work is to perform the spatial and route analysis of primary schools of the Prayagraj district of Uttar Pradesh. The work highlights the possible use of GIS in characterizing the school's available data and improving the education quality in the future through GIS based spatial analysis. Another purpose of the works is to find the least cost optimal path for schools by using multiple parameters.

The methodology flow chart is shown in Fig. 2. It involved collecting spatial and attributes data, including boundary data, road data, schools locations data, school raw data and census data. All the operations that are performed in database creation and the subsequent steps are specified in the flowchart. This involves georeferencing, mosaicing, digitization, network preparation, etc.

Fig. 2
figure 2

Methodology adopted

These data are collected from different sources. Once the data collection is completed, it is converted into the GIS format. Attribute data are then linked with the spatial data according to the differed fields. All spatial and non spatial data are combined using schools code and census code. After completing the data collection processes, three types of databases are created. These are the database for boundary data, the database for road data and the database for schools data. The integrated geographic database of the Prayagraj district is then created by combining all three databases. The spatial analysis and network analysis are carried out on this database.

Several GIS based methods are discussed in the literature review section. Based upon their suitability for study area methods like TSP are selected among them. The integrated geographic database of Prayagraj district is used for performing the spatial analysis on village level census data, primary education data and road network dataset.

Creation of integrated geographic database

Data collection

Different kinds of data are used in this research work, its details are given below:

Administrative boundary

Prayagraj district tehsil boundary map has been prepared by using Survey of India (SOI) topographic maps. The sheet numbers of these maps are G44P10, G44P11, G44P12, G44P14, G44P15, G44P16, G44Q2, G44Q3, G44Q4, G44Q6, G44Q7, G44Q8 and G44W1. These maps are georeferenced, mosaiced and digitized with the help of Quantum GIS (QGIS) software. The Prayagraj district and tehsil boundary map are prepared through this. The village boundary map of the Prayagraj district is collected from the ‘District Census Handbook Allahabad’ which is downloaded from the Census of India website. This map was in.pdf formats. The collected Prayagraj district village boundary map is registered on the topographic map. After this, the village boundary map is digitized and the village boundary shapefile is generated.

Census data

The census data of the Prayagraj district and villages is downloaded from the Census of India website. The census data is used to perform the demographic and socio-economic analysis on the Prayagraj district. There are ninety-six parameters in census data. Out of these parameters, nine parameters are used: village name, village code, block name, total population, total male population, total female population, total literate population, literate male population and literate female population. Based on these parameters, an attribute table of the village boundary layer is prepared in MS Excel. It is later joined with the village boundary shapefile using the common field of village code.

Road data

Prayagraj district road data is downloaded from the OSM website (https://download.bbbike.org/osm/extract/planet_80.08,24.38_83.56,26.57.osm.shp.zip). The datum of this data is WGS 84. This downloaded data is in polyline shapefile format.

To apply the least cost-optimal path analysis, a road networks dataset is required. Using network analysis tools in QGIS software, road data are converted into road networks datasets. Some nodes are missing in the prepared dataset, which creates errors to find the optimal path from the source point to the destination point. The data cleaning process is applied manually by editing a new node and joining the missing road polyline to remove this error. To find the least cost optimal path, it is required to assign the population density weight on the road data based on population density. Therefore, the prepared road network data set is split by the village boundary to give the population density weight to the prepared road data database.

According to the ‘Uttar Pradesh Public Works Department’ (district wise details of road length in Uttar Pradesh (as of 04/01/2021)) in Prayagraj district, there are six national highways (NH076E, NH024B, NH0096, NH0076, NH0027 and NH0002), five major state roads (MD162E, MD146E, MD133E, MD130E and MD121E) and two state highways (SH0102 and SH0007) including Ganga expressway. The Prayagraj Sadar area has a maximum road density than any other tehsil. It is also connected with all other transport lines like the airport, railway line and waterway.

School location

The spatial analysis is carried out on the 4,878 schools of the Prayagraj district. The location of 650 schools is collected through the GPS based field survey. These schools are located in the Prayagraj Sadar and Bara tehsil. The remaining 4,228 school locations are collected from the School GeoPortal website (https://schoolgis.nic.in/). Figure 3 shows the location of schools.

Fig. 3
figure 3

Map showing the location of all 4,878 schools of the Prayagraj district

School attribute data

To perform spatial analysis on Prayagraj district schools, the attribute data of schools is required. This data is collected from the District Information System for Education (DISE) website which was developed by the National Institute of Educational Planning and Administration (NIEPA). DISE provides data sheets to primary schools. They fill the datasheet gathered and published by the DISE. The school's raw data and school locations data are linked. Both spatial and attribute data have a common unique field of school code. Therefore attribute table join operation is performed using this eleven digits code.

Preparation of integrated geographic database

Spatial and non spatial data are collected for this work. Road data, boundary data and schools locations data are spatial data. DISE (DISE 2017–18) and census data are non-spatial data. Both spatial and non spatial data are joined together using the common field in the attribute table, viz, school code and census code. All the datasets are transformed into a common geographic and projected coordinate system. After completing the data collection processes, three types of databases are created, these are the database for boundary data, the database for road data and the database for schools data. The integrated geographic database of Prayagraj district is the combination of these three databases. The spatial analysis is carried out on this integrated geographic database.

Analysis of schools data under GIS environment

Literacy map

Education is necessary for the development of the human being. In Prayagraj district, literates are 3,365,727, among this 2,199,340 are male and 1,466,387 are female. The literacy rate of the Prayagraj district is 68.8%. In Prayagraj district, 81.0% male and 55.5% female are literate.

The literacy rate map of the Prayagraj district is prepared. The literacy rate is classified into ten categories, as shown in Table 2. The literacy rate mostly lies in 51%—70%. In 2,237 villages literacy rate lie between 51%- 70% literacy rate groups. The literacy rate is more than 90% in only three villages. In Tikrohini Khurd, Babhnpur Deragadai and Ledar the literacy rate is 90.63%, 91.17% and 92.42%, respectively. This entire literacy rate map is shown in Fig. 4. There are 673 villages whose literacy rate is below 50% and 158 villages with more than 70%.

Table 2 Literacy rate of Prayagraj district
Fig. 4
figure 4

Village wise literacy rate map

Gender ratio map

To analyze how many girls enrolled in a school per thousand boys, the gender ratio analysis is done. Based on the gender ratio, users can explore the school types like only boys' schools, only girls' schools, etc. In the ideal case, the gender ratio should be equal to 1: 1. But the actual condition is given in Table 3.

Table 3 Students gender ratio

The student's gender ratio map is shown in Fig. 5. In only 171 schools, the number of girls is equal to the number of boys. These schools are shown in the cyan color on the map. There are only 43 boys' schools where only boys' students are taught and 116 girls' schools where only girls' students are taught. There are 2246 schools in which the number of girl’s students is more than boy’s students and 2302 schools in which the number of girl’s students are less than boy’s students. These statistics show a substantial improvement in the education status of girl children. In India, there is an increase of 55.71% in the total literacy rate from 1951 to 2011. An enormous growth has taken place in female literacy which jumped from meager 8.86% in 1951 to 65.46% in 2011. Thus, the awareness for girl child education, availability of schools and government initiatives are the possible factors behind this improvement.

Fig. 5
figure 5

Students gender ratio map of schools

Student teacher ratio map

The student teacher ratio is shown in Table 4 and the gender ratio map is given in Fig. 6. If the student teacher ratio is less than or equal to 40 in primary schools then it is considered to be good. In 3,584 schools, the student teacher ratio is less than or equal to 40. These schools are shown in the green color on the map. There are 222 schools where there is a single teacher on more than 100 students. These schools are shown in black color on the map. In the city region, this ratio is higher whose effect is manifested in the literacy rate. This student and teacher ratio analysis helps parents admit their children to better schools. This map can help the government to improve the quality of education. Based on this analysis government can increase the number of teachers in schools as per the requirement.

Table 4 Students teacher ratio
Fig. 6
figure 6

Students teacher ratio map

Students toilet ratio map

The data on the number of students per toilet is shown in Table 5. The student's toilet ratio map is generated by using DISE facility data. Based on DISE data 2017–18, toilet facility is not available in only 13 schools. These schools are shown in black in the student's toilet ratio map, given in Fig. 7. In 302 schools, there is a single toilet for more than 200 students. There is an urgent requirement for toilets in these schools to ensure the health and hygiene of students. A proper sanitization infrastructure and sufficient number of toilets will help to improve the attendance rates. School toilet should be at the top priority of administration as it improves the overall academic environment.

Table 5 Students toilets ratio
Fig. 7
figure 7

Student’s toilet ratio map

GIS based analysis of school routes

Road networks analysis

A map based least cost optimal path approach is proposed in this work. The road data of the Prayagraj district is downloaded from the OSM website. This polyline data is in shapefile format. To find the least cost optimal path on the road network dataset, assigning weight on these roads is required, which are added in the attribute table. In literature, several parameters are used to designate the weight. Some of them are traveling length, traveling time, road condition, road types, number of tolls, number of turns on the road, safety, weather condition, comfort, congestion and directness. Table 6 gives a summary of different parameters used for route selection.

Table 6 Parameters used in route selection in different researches

Three weights are selected in this research work to find the least cost optimal path. These weights are population density, traveling distance and traveling time. The population density of a place has a significant impact on road traffic. Usually, crowded places face more traffic congestion. In rural areas, students go to school mostly on foot or by bicycle. Therefore, the traveling distance from the source to destination, i.e., from home to school, becomes an essential factor in finding the least cost optimal path. In such a situation, the shortest path is preferred over the low traffic route.

The study area's maximum portion lies in a rural region; therefore, traveling distance is chosen as the second parameter. Every child and their guardian want to spend minimum time in commuting from their home to school, so traveling time is selected as another parameter. A combined weight function is also defined on the basis of these three parameters to find an optimal path.

Prayagraj district schools' location map, along with the road connectivity, is shown in Fig. 8. In this map, school locations are indicated by points and road networks with lines. From this map, it is found that road density is much less in rural areas than in urban areas. Several schools in rural areas are not connected with the road. The maximum numbers of roads are in the city area.

Fig. 8
figure 8

Schools location map with road network connectivity

Population density based least cost optimal path

The population density is calculated for every village to find the least cost optimal path based on it. For this purpose, the area of each village is calculated and added as an attribute. For this, the coordinate reference system of shapefile is converted from Geographic Coordinate System (GCS) to Projected Coordinate System (PCS). The WGS 1984 UTM zone 44 N is selected for the PCS.

The population data of the year 2011 for each village is downloaded from the Census of India website. A new field is added in the attribute table of village boundary for calculating population density and using the field calculation tool. Prayagraj district population density map is shown in Fig. 9. The population density map is classified into five categories. Based on these population density categories average speed limit is assumed in km per hour (Meena et al., 2019). It is shown in Table 7.

Fig. 9
figure 9

Village wise population density map of Prayagraj district

Table 7 Average speed defined based on population density

The road shapefile is split by the village boundaries by using the identity tool in ArcGIS software to assign the population density weight to the road. The network analyst tool is used to compute the least cost optimal path based on the population density. Its map is shown in Fig. 10, where green points show schools' locations while red and sky blue points show the source and destination position, respectively. The red line shows the road and the blue line indicates the optimal path from the source to the destination point. The obtained path will be the least crowded path for reaching the school.

Fig. 10
figure 10

Least cost optimal path based on population density

Traveling distance based least cost optimal path

Travel distance or length is another weight function used to measure the least cost optimal path from source to destination for Prayagraj district schools. Travel distance is the most common parameter for optimal path. Using the calculate geometry tools in ArcGIS software length of each road is calculated and weight is assigned accordingly. The least cost optimal path based on traveling distance is shown in Fig. 11. The obtained path will be the shortest path for reaching the school.

Fig. 11
figure 11

Least cost optimal path based on traveling distance

Traveling time based least cost optimal path

To calculate traveling time in road layer shapefile, the average speed limit is assumed on a population density basis in Table 8. The assumed speed limit is in kilometers per hour (km/h), so it is converted into meters per second (m/s). To calculate the time, a new field is added in the road shapefile attribute table and with the help of the field calculate tool, the traveling time for each road is calculated in seconds. A new least cost optimal path based on traveling time is computed using a network analysis tool which is shown in Fig. 12. The obtained path will be the time taking path for reaching the school.

Table 8 Unit weight of population density, traveling distance and traveling time
Fig. 12
figure 12

Least cost optimal path based on traveling time

Combine weight based least cost optimal path

A combined weight function is defined to find the least cost optimal path that gives the best results by considering all three parameters. To calculate the optimal path, based on the previous three parameters, a combined equation is defined. This combined weight function is the amalgamation of Population Density (PD), Traveling Distance (TD) and Traveling Time (TT). As shown in Table 8, the weight of population density, traveling distance and traveling time has been calculated.

In rural or urban areas, traffic is affected by population density, so in combined weight function weight of population density is multiplied by three. The traveling distance and time play an essential role in selecting the route to reach the school. Therefore, the weight of traveling distance is multiplied by two and traveling time is multiplied by one. The combined weight function is defined as: -

$${\text{Combined}}\,{\text{weight}}\,{ = }\,{(3*}\,{\text{Weight}}\,{\text{of}}\,{\text{PD + 2*}}\,{\text{Weight}}\,{\text{of}}\,{\text{TD + 1*}}\,{\text{Weight}}\,{\text{of}}\,{\text{TT)/6}}$$

A new field is added in the road attribute table for combined weight. The combined weight is calculated using the above equation. The attribute table is prepared for these parameters. The least cost optimal path based on combined weight is shown in Fig. 13.

Fig. 13
figure 13

Least cost optimal path based on combine weight

School accessibility analysis

Several times, officials need to visit all the schools for different purposes like inspection, distribution of study material, etc. Therefore, it is crucial to know the accessibility of the schools and the sequence of visits. TSP is used for this purpose as it is a network analysis method that gives the best route to visit spatially separated stops and return back to the starting point of the journey.

TSP is a process of visiting all schools one by one starting from a particular source point and returning to that starting point location after visiting all schools. In the TSP, a school visited only once and went to the next nearest school using the least cost optimal path. This process continues till all schools' visiting process is completed. The route followed by a visitor is known as the least cost optimal path for TSP.

Given a set of schools, distance, time and population density weight between every pair of schools, the TSP can find the least cost shortest possible route that visits every school exactly once and returns to the starting point. TPS helps to find that the road is available or not to reach a particular school. The input given to the TSP was the location of all schools and the road network. The weight is assigned to the route between any two schools. It is given based on population density, travel length, travel time or combination of all parameters.

The TSP map of each tehsil is shown in Fig. 14. In this map, the purple colors point shows the schools that have road connectivity. The sequence number is mentioned inside it. Schools that are not reachable or not connected by a road facility are shown by the grey color points with question marks. The sky blue color point shows the starting point of the visit. On the basis of TSP solution, it is found that in the Prayagraj district, 253 schools are not connected with road facilities within 1000 m. TSP helps to analyze whether visited schools are connected or not with the road network. Therefore, based on the TSP, new routes required to join every school of Prayagraj can be identified through this analysis. TSP can be used to plan the route for the government officer for the survey of schools. TSP can also be used to distribute books, food materials and other things that the government wants to distribute in schools. Based on the TSP solution, the sites can be suggested to the government for new road construction in order to increase the transport connectivity to reach schools.

Fig. 14
figure 14

TSP solution for the schools of each tehsil of Prayagraj district

Discussion

This research work is focused on the 4878 schools of the Prayagraj district, India. It investigates the use of GIS in the education sector. The spatial analysis on the various parameters highlighted the study area's education-related issues. This can help to eradicate the regional disparity of educational resources. The maps are created that illustrate the problem. They indicate the spots where there is an imbalance of facilities, as given below.

Gender disparity in education is a big issue, especially in the implementation of RTE and NEP 2020. Girls' enrollment is less than that of boys at all levels of education (Goel & Husain, 2018). Gender disparity in education adversely affects the women’s conditions that have a multiplier impact on the whole society. Husain (2011) has reported that gender bias is more prominent in the North Indian states like Uttar Pradesh, Rajasthan, Bihar and Madhya Pradesh. In rural areas, this problem is more accentuated as compared to urban areas. Gender disparity is also seen in the enrollment of students from elementary to secondary education (Kumar et al., 2021). There are many factors behind it like the low socio-economic status of a family, school accessibility and cost of school fees as RTE provides free education only for elementary schooling. Sahoo (2017) has conducted a three period longitudinal study on 43 villages of Uttar Pradesh and found that parents are more willing to spend money on private schools for their sons than their daughters. This tendency is more pronounced in rural areas. The government is trying to bridge the gender gap in school education through schemes like Kanyashree, Beti Bachao Beti Padhao, National Schemes of Incentive to Girls for Secondary Education, Balika Samriddhi Yojana, etc. Close monitoring is required on the gender ratio of school students. The student gender ratio analysis is carried out in this work according to which the number of girls' students is less than boys' in 2,302 out of 4,878 schools. The obtained results are inline with previous research. But, an improvement in the status of the girls boys ratio is observed. As per census data, 81.0% male and 55.5% female are literate which indicates an improvement in the gender wise literacy in the district.

When the parents or guardians judge their child's admission to a particular school, one of the parameters that they usually consider is the student teacher ratio. It is the number of students taught by a single teacher in a class. As per RTE, this ratio must be 30:1 and 35:1 at the primary and upper primary levels, respectively. In the earlier studies, it was found that the ratio is still not meeting the recommended standard. The majority of highly educated teachers want to be in the city regions; therefore, this gap in the ratio is more prominent in the rural areas as compared to the urban regions (Govinda & Bandyopadhyay, 2010). In this research work, it was found that in 222 out of 4,878 schools have a single teacher on more than 100 students and most of these schools lied in the rural region. The student teacher ratio analysis carried out in this work demonstrated that student teacher ratio is better in the city region. It has an impact on the literacy rate and quality of education.

The presence of toilets and proper hygiene conditions in schools has multifaceted effects. It improves health, gender parity, enrollment and attendance among students (Kim & Rhee, 2019). Adukia (2017) has conducted a study on 269 districts of 9 states in India using DISE, Annual Status of Education Report (ASER) and census data. The author has conducted an empirical analysis and found a positive correlation between enrollment and toilet facilities. The study has also revealed the positive impact of gender specific toilets on enrollment, especially among pubescent girls. Ray and Datta (2017) have conducted a study on the impact of gender specific toilets on female enrollment in India's primary and upper primary schools. They also reported the improvement in gender parity indices of enrollment because of gender specific toilets. The government of India and United Nations Children's Fund (UNICEF) has taken several initiatives in the direction of sanitization and toilet building like School Sanitation and Hygiene Education program, Swachh Bharat: Swachh Vidyalaya, Total Sanitation Campaign, etc. Spatial mapping and analysis will help in planning and monitoring this facility. The student toilet ratio analysis carried out in this work will give help in this. In 302 out of 4,878 schools, there is a single toilet on more than 200 students. It is found that more work is needed on the sanitization and hygiene conditions of schools, especially in rural areas. Although, over the time there has been a considerable improvement in sanitization and hygiene infrastructure as well as public awareness regarding it.

GIS-based analysis of school routes is carried out. On the basis of TSP solution, it was found that in the Prayagraj district, 253 schools were not connected with road facilities within 1 km. The optimal path analysis has also been performed based on different parameters viz population density, traveling distance and traveling time.. This work also gave the optimal route to visit all the schools of the study area which could be very useful solution for administrators that need to visit all the schools. Commuting is a big issue for pupils which are optimally handled in this research. The study has depicted those places where more roads need to be built to improve school accessibility. Children are facing difficulty because of the uneven distribution of roads in urban and rural areas. There is a need to construct more roads and pedestrian lanes to reach the school. The findings of this work can help the academic policy makers in providing a better solution to the area.

The government's past and existing schemes are mentioned along with the related statistics which will be helpful to other people who are working in this field. The forthcoming researches can expand this work to other districts by including more parameters.

Conclusion

As per the first objective, an integrated geographic database is prepared from multi source data. It includes the spatial and attribute data of schools, roads and villages of the Prayagraj district. The location of 4,878 schools is stored, among which 650 school locations are collected through the GPS based field survey while the remaining 4,228 school locations are collected from the School GeoPortal website. The road and village related data are prepared from OpenStreetMap and the census handbook. The prepared comprehensive database will be useful for the studies on primary schools of the Prayagraj district. All the datasets are transformed into a common geographic and projected coordinate system. Therefore it becomes easy to perform requisite spatial analysis leading to informed decision-making on these datasets.

As per the second objective, spatial analysis is carried out for student gender ratio, student teacher ratio and availability of different infrastructure facilities. A major contribution of this study is the identification of several problems through GIS based analysis which is listed here. There are 222 schools where there is a single teacher on more than 100 students. In 302 schools, there is a single toilet on more than 200 students. The results help to identify the location of all those schools where there is a need to take corrective measures. The reallocation of expenditures can be done in more judicious way to improve the condition of such schools. The outcomes of spatial analysis of schools assist in conceptualizing the holistic development of a child's education that involves the various facilities provided by the schools and standards set by the GoI under RTE. The results identify the location of all those schools where there is a need to take corrective measures. This will improve the condition of such schools and the reallocation of expenditures can be done more judiciously based on the obtained results. One of the biggest problems is the effective and timely implementation of NEP 2020 across all the schools. The developed solutions will be beneficial in the effective implementation and monitoring of the NEP 2020.

Commuting between home and school is a big issue in child education that is handled in this research work through third objective. School commutation has financial implications, like students might have to take motorized commuters. Sometimes, it is the sole criteria in the selection of a school. In this work, a detailed route analysis is carried out using the different parameters, viz, population density, travel length, travel time and combination of all these. The results help to identify the optimal path based upon the requirement. School accessibility analysis is also covered in this work. Sometimes, the government is required to visit all the schools to distribute study materials, food items and other things. School accessibility results obtained in this research through TSP will assist in choosing the best possible route. It will help the government in identifying those locations where new roads need to be constructed to ensure proper connectivity and transportation facilities.