Toward an improved learning process: the relevance of ethnicity to data mining prediction of students’ performance

Adekitan, Aderibigbe Israel; Salau, Odunayo

doi:10.1007/s42452-019-1752-1

Toward an improved learning process: the relevance of ethnicity to data mining prediction of students’ performance

Research Article
Published: 02 December 2019

Volume 2, article number 8, (2020)
Cite this article

Download PDF

SN Applied Sciences Aims and scope Submit manuscript

Toward an improved learning process: the relevance of ethnicity to data mining prediction of students’ performance

Download PDF

2170 Accesses
10 Citations
Explore all metrics

Abstract

The ability to predict failure is an advantageous educational tool that can be effectively used to counsel student, and this may also be used as a tool for developing, and channelling adequate academic interventions toward preventing failure and dropout tendencies. Students are generally admitted based on their evaluated academic potentials as measured using their admission criteria scores. This study seeks to identify the relationship, if any, between the admission criteria scores and the graduation grades, and to examine the influence of ethnicity using the geopolitical zone of origin of the student on the predictive accuracy of the models developed using a Nigerian University as a case study. Data mining analyses were carried out using four classifiers on the Orange Software, and the results were verified with multiple regression analysis. The maximum classification accuracy observed is 53.2% which indicates that the pre-admission scores alone are insufficient for predicting the graduation result of students but it may serve as a useful guide. By applying over-sampling technique, the accuracy increased to 79.8%. The results establish that the ethnic background of the student is statistically insignificant in predicting their graduation results. Hence, the use of ethnicity in admission processes is therefore not ideal.

Data mining approach to predicting the performance of first year student in a university using the admission requirements

Article 03 December 2018

The role of demographic and academic features in a student performance prediction

Article Open access 22 July 2022

Early prediction of Student academic performance based on Machine Learning algorithms: A case study of bachelor’s degree students in KSA

Article 13 December 2023

1 Introduction

Education is generally said to be the bedrock of national growth and national integration, but studies over the years have shown that the relationship between national integration and education is not linear [1]. Higher education in Nigeria began in 1932 with the establishment of the Yaba Higher College [2], and in 1940, the University College at Ibadan was established in an effort to promote higher education within the colony. According to Okebukola [3] as quoted by Olujuwon [2], regional universities were created based on the recommendations of the Asbby commission; the University of Nigeria was established in the east in 1960, University of Ife in 1961, and in 1962, the University College at Ibadan was granted a full university status. In the North, the Ahmadu Bello University was established in 1962. This marked the beginning of region-based education in Nigeria, and as at today there are 165 Universities in Nigeria according to the National Universities Commission, 43 of which are federal universities, 47 state-owned universities and 75 private universities with only 29% of Joint Admissions and Matriculation Board (JAMB) applicants admitted into the university [4] due to the limited admission slots [5].

Nigeria, with a land area of 923,768 km² and an estimated population of over 190 million [7] has thirty-six (36) states which are divided into six (6) geopolitical zones, namely; South South, South West, South East, North East, North West, and North Central as shown in Fig. 1. The states were aggregated based on their cultural similarities, ethnicity and common history [8, 9]. With more than two hundred and fifty ethnic groups and over 500 languages in Nigeria, the government had to develop a model for ensuring adequate allocation of political, economic, and educational resources across the regions, and this was achieved by grouping into states and geopolitical zones [10]. Overtime, it became evident that the reception and value for education varies across the geopolitical zones in Nigeria.

According to Ukiwo [1], access to education in Nigeria has been politicised due to the plural nature of the society, and this has a tendency to engender economic inequalities. Nigeria is majorly divided along ethnic and religious lines, and these factors strongly influence how government policies and efforts generally, are perceived and interpreted. The inequality and politics of higher education started in the early 90’s with the intense pursuit of educational development and attainment in the south west region of the country while the Northern region was quite laid back. To put this in perspective, according to Abernethy [11] as quoted by Ukiwo [1], although the North had about 55% of the national population in 1912, 1926 and 1937, enrolment into primary schools stood at 950, 5200 and 20,250 in the North respectively while in the South, enrolment figures were 35,700, 138,250 and 218,600 and this indicates a significant disparity in regional educational status. The Eastern and Western region of the country achieved drastic growth in educational attainment as compared with the Northern region at the birth of the nation Nigeria. The old Northern region of Nigeria which was educationally laid back, now comprises three geopolitical zones and these are: North West (NW), North East (NE), and North Central (NC) while the former Eastern region and Western region in the South that were developed educationally make up the remaining three geopolitical zones.

Primary and secondary education are vital for tackling illiteracy issues and for ensuring national development. In order to develop a total man equipped with adequate knowledge and capacity to handle today’s societal problems and developmental needs, higher education is required [12]. With a focus on higher education, the National Universities Commission was setup in 1962 as an advisory agency, and it became a statutory body under Decree 1 of 1974 charged with the responsibility of ensuring adequate development and regulation of university education in Nigeria. Increasing demand for university education has created a daunting task for regulatory bodies in a bid to ensure increasing admission slots, and at the same time, enforce quality education with the reality of the ever-present national challenges such as inadequate finance, insufficient educational facilities, unavailable material resources and variances in regional educational status.

Regional differences in the quality and acceptance of education have been drastically reduced over the years due to various deliberate regulatory efforts, but evidences of it still exist in the Nigerian education sector. Admission into institutions in Nigeria is often influenced by the geopolitical zone of the applicants; some states in Nigeria are considered as educationally disadvantaged while some universities have catchment areas, and the cut-off mark (admissible score) for students from this regions are set lower than the general cut-off mark for students from other regions, even though they all sat for the same entry examination. Likewise, government related recruitment are also polarised with deliberate efforts put in place to ensure that successful applicants are selected from all the geopolitical zones. To achieve this, sometimes the pass mark for some regions is deliberately reduced as compared to others. These practices create conflict of opinions, while some Nigerians are pleased that it ensures fair sharing of opportunities among all the geopolitical zones, others think it is unfair and does not ensure success of the best candidates, and is therefore not the best option if Nigeria desires to attain its full human resource potential, and also maximize her entrepreneurial capability and governmental performance [13].

Does the geopolitical zone of origin of students affect their academic performance in higher institutions? In this study, the effect of the geopolitical zone from which a student originates on the anticipated graduation result of the student is examined using a number of selected admission criteria. The dataset analysed in this study is from a private Nigerian university with broad admission criteria that enable students from all the geopolitical zones of the country to fairly compete for the available admission slots. Predictive analyses using data mining on Orange software and regression models were carried out to identify hidden knowledge and vital statistical trends for students from the six geopolitical zones in Nigeria, in order to understand the impact, if any, of ethnicity [14] on the prediction accuracy of the graduation CGPA of University students in Nigeria using selected features.

2 Background

Data related research is on the increase due to the enormous potential benefit in terms of knowledge acquisition and application [15, 16]. Educational data mining is the application of data mining methodologies in educational-data related research studies toward solving education related issues [17]. It entails the stepwise extraction of hidden and useful information from a dataset [18] generated within the education sector in a bid to further understand students, and the effectiveness of the learning process. Educational data mining converts seemingly meaningless data into useful knowledge that can greatly impact the practices and regulatory methods within the education domain. Generally, educational data mining comprises data collection, data sorting and pre-processing, data mining, and post processing of data mining results. Some of the data mining techniques deployed includes clustering, text mining, association rule mining, classification and so forth. The mode of delivery of education has transcended from the traditional classroom-based method to various web-based teaching platforms and adaptive e-learning systems [19] due to the availability of modern learning technologies [20,21,22], and this therefore enables diverse education related data to be generated, logged, processed and studied for a better understanding of the learning process and learners [23], and educational technology integration practices [24].

In a higher institution, various types of data can be collected with different levels of relationship and hierarchy, and as such, the type of knowledge that can be mined from an educational dataset is a function of the nature and the origin of the data. Educational data mining can be used for different objectives [17]. Educational data mining helps to understand the relationship between the educators and the students, it reveals weaknesses and gaps in the learning process, it can be used to predict potential for negative student behaviours [23], and for predicting dropout potential and students performance [25, 26], it aids the development and review of learning models, it can be used to measure the effectiveness of any intervention deployed, and may also be used to guide the learning efforts of learners. The quality and effectiveness of decision processes can be greatly enhanced using educational data mining [27], and likewise, vital feedbacks from students can also be evaluated using data mining techniques in order to identify lapses, areas of need and improvement in teaching and learning processes. Through data mining techniques, students can be classified into unique groups based on well-defined criteria to enable the deployment of purpose-specific and targeted learning interventions, and for identifying common skill set, social attitudes, learning behaviours [28, 29] and interests [30, 31]. The effectiveness of academic modules and the developments of new contents can also be evaluated using data mining techniques [32].

2.1 Related literatures

In the study by Hussain et al. [33], 24-attribute based dataset was evaluated using Bayes Network, Random Forest, J48 and PART classifiers on WEKA data mining platform to predict the semester performance of students. A predictive Multi-Layer Perceptron model was developed by Nurhayati et al. [34] for evaluating the student record of 292 students by using 5 features to predict the graduation potential of the students. In Adekitan and Salau [35] the data of 1445 students covering academic sessions from 2005 to 2009 was evaluated using data mining techniques to determine the extent of the correlation between the admission selection scores and the scholarly performance of student in their first academic year. Similarly, the performance of students in their first year was predicted by Ahmad, Ismail [36] using rule-based, decision tree (J48) and naïve bays classifiers to data mine 399 student records. Using dataset generated at a Bulgarian university, the study by Kabakchieva [37] demonstrated the use of data mining techniques for enhancing university management decision making process by extracting knowledge from 10,330 students’ record with 20 features using the WEKA software.

The study by Alharbi et al. [38] carried out data mining analysis for early identification of at-risk students using the admission records and performance in their first year of study. The results of student-failure potential analysis create an opportunity for warning at-risk student of a potential failure early enough so that drastic intervention can be deployed [39]. In the study by Atta Ur et al. [40], the level of acceptance of course time table and teaching methods by student was measured using machine learning algorithms, by administering an investigative questionnaire consisting of 38 teaching and learning related questions. Learning analytics can be defined as the measurement, gathering, investigation and reporting of relevant data about student, and the learning process, and methods [41, 42]. The research by Bharara et al. [43] identified new metrics that are relevant to the learning process in order to develop a robust model for evaluating student performance. Using 4 categories of features; these are interactional, academic, the level of parent’s participation in the education of their children, and demographic features. In the study, student dataset containing 500 samples was analysed using clustering data mining methodologies.

3 Data descriptive statistics

Statistical attributes of 2413 student dataset across the 4 colleges in Covenant university is presented in this section. Table 1 shows the descriptive statistics of the JAMB Score, WAEC Aggregate and the CGPA of the 2413 students. Figures 2 and 3 show the probability density function plot of the JAMB score and the cumulative probability plot of the JAMB score. Figures 4 and 5 present the probability density function plot and the cumulative probability plot of the WAEC aggregate score respectively, while Figs. 6 and 7 show the probability density function plot and the cumulative probability plot of the CGPA of the students at graduation respectively. To show the variations in the graduation CGPA of students across the six geopolitical zones, the CGPA data is presented as a box plot in Fig. 8, while in Fig. 9 the box plot shows the CGPA variations across the four colleges. As shown in Fig. 8, the North east and North west students had the lowest average CGPA of 3.369 and 3.366 respectively. From Fig. 9, it was observed that students from the college of engineering had the highest average CGPA of 3.52 while the college of science and technology had the least average CGPA of 3.34. Figure 10 presents the distribution of the number of student per grade classification for the six geopolitical zones.

Table 1 Descriptive statistics of the numeric data features

Toward an improved learning process: the relevance of ethnicity to data mining prediction of students’ performance

Abstract

Similar content being viewed by others

Data mining approach to predicting the performance of first year student in a university using the admission requirements

The role of demographic and academic features in a student performance prediction

Early prediction of Student academic performance based on Machine Learning algorithms: A case study of bachelor’s degree students in KSA

1 Introduction

2 Background

2.1 Related literatures

3 Data descriptive statistics

4 Methodology

5 Results

5.1 Educational data mining using Orange application

5.1.1 The tree algorithm

5.1.2 The neural network algorithm

5.1.3 The Naïve Bayes algorithm

5.1.4 The random forest algorithm

5.1.5 Performance comparison of all the algorithms

5.2 The impact of under-sampling and over-sampling on model performance

5.3 Analysis using multiple linear regression

5.3.1 Regression analysis considering geopolitical zone

5.3.2 Regression analysis after excluding insignificant predictors

6 Discussion

7 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation