Skip to main content

A novel prediction model for educational planning of human resources with data mining approach: a national tax administration case study


Human resources training is considered an effective solution in empowering human resources. Organizations try to have effective educational planning for this precious resource by identifying shortcomings through a need assessment. This study provides a model based on organizational data analysis to achieve a unique and appropriate training planning for each staff. Therefore, job performance, organizational promotion and lay-off have become the basis for staff training planning. For this purpose, the tax assessor’s information was investigated. Then, the CRISP-DM methodology was selected, and the project was implemented. Furthermore, a decision tree model was selected to extract unknown rules and patterns in the educational decision-making staff; the neural network model was selected as the predictive model to predict the target variables. The results revealed the decision tree for predicting job performance variables and organizational promotion status, and the neural network model was more effective in predicting service lay-off variables.

This is a preview of subscription content, access via your institution.


  1. Using MATLAB software.

  2. This variable has two classes which indicate whether it is upgraded (1) or not (2).

  3. The confidence index of each rule to its initial probability.

  4. Recall (sensitivity) represents the percentage of all predictions categorized by the model correctly (Lui et al., 2021).

  5. Precision (positive predictive value) means the percentage of relevant model predictions (Lui et al., 2021).


  • Abri Aghdam, K., Aghajani, A., Kanani, F., Sanjari, M. S., Chaibakhsh, S., Shirvaniyan, F., Moosavi, D., & Moghaddasi, M. (2021). A novel decision tree approach to predict the probability of conversion to multiple sclerosis in Iranian patients with optic neuritis. Multiple Sclerosis and Related Disorders, 47, 102658.

    Article  Google Scholar 

  • Abtahi, S. H. (2004). Training and upgrading human capital. Poyandeh Publications. (In Persian).

    Google Scholar 

  • Akhavan, M., & Kazemi Gorji, A. (2019). The impact of training on productivity and human resources to investigate the role of intermediary organizational agility and intellectual capital (the case of the eighth base Babai martyr of prey). Journal of Training in Police Sciences, 26(26), 25–54. (In Persian).

    Google Scholar 

  • Alipour, K., Prdsry, I. G., & Zolfaghari Zafarani, R. (2019). Provide a model to improve the efficiency of human resource training in Islamic Azad University. Journal of New Approaches in Educational Administration, 10(38), 179–208. (In Persian).

    Google Scholar 

  • Aparnak, A., Ghasemi, P. (2016). Measuring the performance of bank employees with a multi-criteria decision approach. In: 2nd International Conference on Modern Research in Management and Industrial Engineering.

  • Ashraf, M., Zaman, M., & Ahmed, M. (2020). an intelligent prediction system for educational data mining based on ensemble and filtering approaches. Procedia Computer Science, 167, 1471–1483.

    Article  Google Scholar 

  • Asif, R., Merceron, A., Ali, S. A., & Ghani Haider, N. (2017). Analyzing undergraduate students’ performance using educational data mining. Computers & Education, 113, 177–194.

    Article  Google Scholar 

  • Burgos, C., Campanario, M. L., Peña, D. D., Lara, J. A., Lizcano, D., & Martínez, M. (2018). Data mining for modeling students’ performance: A tutoring action plan to prevent academic dropout. Computers & Electrical Engineering, 66, 541–556.

    Article  Google Scholar 

  • Carnevale, J. B., & Hatak, I. (2020). Employee Adjustment and well-being in the Era of COVID-19: Implications for human resource management. Journal of Business Research, 116, 183–187.

    Article  Google Scholar 

  • Costa, E. B., Fonseca, B., Almeida Santana, M., de Araújo, F. F., & Rego, J. (2017). Evaluating the effectiveness of educational data mining techniques for early prediction of students’ academic failure in introductory programming courses. Computers in Human Behavior, 73, 247–256.

    Article  Google Scholar 

  • Chung, J., Ko, N., Kim, H., & Yoon, J. (2021). Inventor profile mining approach for prospective human resource scouting. Journal of Informatics, 15(1), 101–103.

    Article  Google Scholar 

  • Entezari , M.S. (2015). The role of education on labor productivity and quality management in education and business excellence model. In: 2nd International Conference on New Research in Management, Economics and Accounting. (In Persian)

  • Hand, D. J., Christen, P., & Kirielle, N. F. (2021). An interpretable transformation of the F-measure. Machine Learning, 110, 451–456.

    MathSciNet  Article  MATH  Google Scholar 

  • Hatami, J. (2016). The challenge of teaching humanities in Iranian universities: A qualitative study. Journal of Research in Education Systems, 10(32), 234–273. (In Persian).

    Google Scholar 

  • Helal, S., Li, J., Liu, L., Ebrahimie, E., Dawson, S., Murray, D. J., & Long, Q. (2018). Predicting academic performance by considering student heterogeneity. Knowledge-Based Systems, 161, 134–146.

    Article  Google Scholar 

  • Huber, S., Wiemer, H., Schneider, D., & Ihlenfeldt, S. (2019). DMME: Data mining methodology for engineering applications – a holistic extension to the CRISP-DM model. Procedia CIRP, 79, 403–408.

    Article  Google Scholar 

  • Imani, F., Aghabakhshi, H., & Ghaedi Mohammadi, M. J. (2013). The effect of short-term in-service training courses on the performance of municipal employees in Tehran’s District 7 in 1989. Case study. Social Research, 5(17), 29–46. (In Persian).

    Google Scholar 

  • Liu, P., Qingqing, W., & Liu, W. (2021). Enterprise human resource management platform based on FPGA and data mining. Microprocessors and Microsystems, 80, 103330.

    Article  Google Scholar 

  • Liu, S., Jiang, H., Wu, Z., & Li, X. (2022). Data synthesis using deep feature enhanced generative adversarial networks for rolling bearing imbalanced fault diagnosis. Mechanical Systems and Signal Processing, 163, 108139.

    Article  Google Scholar 

  • Mills, K. E., Weary, D. M., & von Keyserlingk, M. A. G. (2021). Graduate student literature review: Challenges and opportunities for human resource management on dairy farms. Journal of Dairy Science, 104(1), 1192–1202.

    Article  Google Scholar 

  • Molaei, N., Goldar, Z., & Emdadifar, O. (2010). Investigating the relationship between in-service training and dimensions of human resource empowerment in staff and operations managers of Shazand Arak oil refinery. Human Resource Management in the Oil Industry, 1(4), 101–126.

    Google Scholar 

  • Rahmani, K., Daryadel, A. (2017). Modeling the qualification of human resources in organizations with the approach of neural networks. In: The Second International Conference on Management and Accounting, Tehran. (In Persian)

  • Tomasevic, N., Gvozdenovic, N., & Vranes, S. (2020). An overview and comparison of supervised data mining techniques for student exam performance prediction. Computers & Education, 143, 103–676.

    Article  Google Scholar 

  • Zhou, H. F., Zhang, J. W., Zhou, Y. Q., Guo, X. J., & Ma, Y. (2021). A feature selection algorithm of decision tree based on feature weight. Expert Systems with Applications.

    Article  Google Scholar 

Download references


The authors acknowledge the support of Iranian National Tax Administration.

Author information

Authors and Affiliations



MA wrote the literature review, gathered and analyzed the data. AB translated the manuscript. MK edited the manuscript.

Corresponding author

Correspondence to Mohammad Khalilzadeh.

Ethics declarations

Conflict of interest

There is no conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Appendix 1

Data preparation

In the data processing and preparation stage, according to the data mining standard, data cleansing operations were performed in order to improve the quality of the data set. At this stage, Excel and Modelling software were used to check the logic of the data, inconsistencies, outlier and lost data, and related actions were performed as follows.

  1. 1.

    The personnel ID number field was excluded from the number of project variables because of not performing any computational operations on the personnel ID number field. This field was only for the purpose of linking among other datasets and has no role in data modeling as an input variable.

  2. 2.

    Since the data of the personnel number field is numerical and data mining software is also known as numerical, the storage format of this field was changed from numeric to string and. The decimal numbers obtained from entering the software were also removed and modified.

  3. 3.

    Considering the personal and job information databases received from the human resources department were multiple, the educational information databases were transactional and had to be aggregated and integrated with each other in the preparation of the data. Thus, sequential unification was performed for the training data set and the personnel number field was also considered as a key field.

  4. 4.

    The aggregation of personal, professional, and Training data sets was done in the first step. In addition, the final set was obtained in the form of a record dataset with 473 records and 120 initial fields.

  5. 5.

    The records of the "Tax assessor”, as the target population of this study, were separated from other records and the new and integrated dataset reduced to 370 records.

  6. 6.

    The information of functional data set for the tax assessors including that of the annual performance evaluation score and the number of offers presented in the suggestion system from 90 to 95 was added to the software. The integration operation was also performed with the integrated dataset of the first stage.

  7. 7.

    In order to predict the amount of human resource promotion, a field called promotion score was prepared based on the information in the human resources dataset. According to the number of promotions of each person during 2012–2017, the individual promotion score was determined. It should be noted that the maximum number of promotions for tax staff is 4 degrees from the audit assistant to senior auditor.

  8. 8.

    The datasets of the tax assessment tests of the tax assessors of the General administration of large taxpayers were entered into the software and linked to the dataset accumulated from the previous steps with the key field of personnel ID number.

  9. 9.

    After entering and aggregating all data sets, the number of fields for 370 records of assessors’ staff increased to 143 fields.

  10. 10.

    The first step of reducing variables (dimensions) was done through removing 107 unrelated and unnecessary fields based on the organization experts’ opinions. Thus, 36 main fields were remained. According to the experts’ opinion, the deleted fields were deemed unnecessary in solving the problems raised for this research.

After data aggregation and integration operations, the proceedings related to increasing data quality were performed as follows:

  1. 1.

    Converting the format of the age variable (field) from a string to a quantity (integer) in order to perform the necessary calculations.

  2. 2.

    Performing indexing to increase the quality of modeling and its upgrading through constructing the indicators of average score of appraisal performance during the period under review" and "score of the system of proposals in the period under review.

  3. 3.

    Average scores of performance appraisal and suggestion system are related to the scores of these two variables during 2012–2017 were introduced as two new variables as the replacement for the previous variables of performance appraisal and suggestion system.

  4. 4.

    The variable of the date of the course was converted into the minimum and maximum date of participation in the training courses. Then, two important educational indicators were created. It means the time interval between the first and the last training course in terms of year for each person and the time interval of the last training course have been passed so far.

  5. 5.

    Date variable of holding training courses was removed from the list of primary variables after becoming two key indicators

  6. 6.

    Considering the number of people in the diploma and associate’s degrees, a total of 19 people was reached. These two levels were combined into one degree, i.e. “Associate" in order to improve the quality of this variable.

  7. 7.

    According to the various data of the variable field of study (various fields of study) and the frequency of different trends, classification was done in terms of economic, accounting, management, and other sets.

  8. 8.

    Due to the existence of only one record with the job title of "audit assistant ", this record was combined with the job title of "tax Auditor” to improve the quality of modeling. In addition, the position of representative of the organization in tax dispute resolution board remained unintegrated due to its importance despite being small (Appendix 1: Table 9).

  9. 9.

    Given that 64 records with less than 1 year of service were recorded in the human resources dataset, this number was changed to one year in the data preparation process.

  10. 10.

    Format of the record of service field was converted from a string to a small number (integer number), and coding and the conditioning were applied to replace years less than one year to one year (high threshold).

  11. 11.

    Moreover, given the fact that the variables "years" and "age" did not have a normal distribution and this could affect the data mining algorithms in the construction of the model based on Appendix 1: Table 10, they were converted from continuous to class-sequential format.

  12. 12.

    Regarding outlier data, the specified intervals for outlier data in the variables with normal distribution 3 σ to 5 σ and the quadratic range 2–3 for the variables with abnormal distribution, were considered. In this evaluation, two values were detected for the outlier data, which were replaced by threshold values.

  13. 13.

    After reviewing the outliers, the lost data was reviewed. Regarding the use of office automation system about human resources in the last 5 years, the process of storing human resources information has been such that the lost data were not significantly observed. In addition, in the information system of the education department, the information of previous years could not be retrieved due to the launch of the educational information system in the last year. For example, given that 32% of variable records such as "training course scores" were empty of the staff training file database and it was not possible to retrieve and count them, we could not comment on them clearly (for example, the maximum score and score of each lesson was not clear) because of the lack of knowledge about the interval between the scores. Thus, the use of this variable as well as other useful variables which could be a criterion for evaluation courses was omitted.

Table 9 "Organizational post title" Variable in the project data set
Table 10 Leveling of “organizational service record” variable

Appendix 2

See Table 11.

Table 11 Proposed algorithm (C5.0) for target variables

Appendix 3

See Table 12.

Table 12 Performance of target variables under default and advanced settings mode

Appendix 4

See Fig. 1.

Fig. 1
figure 1

Decision tree model for the "Lay-Off", shows the decision tree model for the "Lay-Off" with a depth of 11 and the accuracy of Training data of 73.49%

Appendix 5

See Table 13.

Table 13 Ranking of influential variables in the formation of the selected predictor tree

Appendix 6

See Table 14.

Table 14 Extracted important rules for significant classes of target variables

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Arfaee, M., Bahari, A. & Khalilzadeh, M. A novel prediction model for educational planning of human resources with data mining approach: a national tax administration case study. Educ Inf Technol 27, 2209–2239 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Human resource management
  • Educational planning
  • Data mining
  • Decision tree
  • Neural network
  • Case study