Personally Identifiable Data Field Checking Using Machine Learning

  • Yu-Chih WeiEmail author
  • Wei-Chen Wu
  • Ya-Chi Chu
Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 551)


Privacy impact assessments are the most frequently approach used by organizations to control privacy risks. However, how to automatically check PII data fields in PIA results is a big challenge. Manual verification of PIA results carried out by information security consultants is time consuming, costly and slow. In addition, detecting errors and anomalies in the data as their concentration wavers after a certain period of time. In this paper, we propose a methodology of checking PII data fields that can greatly reduce human errors and improve the quality of PIA reports.


Privacy impact assessment Abnormal detection Personally identifiable information 



This research is supported by Telecommunication Laboratories, Chunghwa Telecom Co., Ltd.


  1. 1.
    Jieba: Chinese text segmentation.
  2. 2.
    Information technology - security techniques - guidelines for privacy impact assessment, pp. 1–43. ISO/IEC 29134:2017 (2017)Google Scholar
  3. 3.
    Bieker, F., Friedewald, M., Hansen, M., Obersteller, H., Rost, M.: A process for data protection impact assessment under the european general data protection regulation, pp. 21–37. Springer, Cham (2016).
  4. 4.
    Council of the European Union: General data protection regulation (2016).
  5. 5.
    Day, M., Lee, C.: Deep learning for financial sentiment analysis on finance news providers. In: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 1127–1134 (2016).
  6. 6.
    De, S.J., Le Métayer, D.: PRIAM: a privacy risk analysis methodology. In: Livraga, G., Torra, V., Aldini, A., Martinelli, F., Suri, N. (eds.) Data Privacy Management and Security Assurance, pp. 221–229. Springer, Cham (2016)CrossRefGoogle Scholar
  7. 7.
    De, S.J., Le Métayer, D.: A refinement approach for the reuse of privacy risk analysis results. In: Schweighofer, E., Leitold, H., Mitrakas, A., Rannenberg, K. (eds.) Privacy Technologies and Policy, pp. 52–83. Springer, Cham (2017)CrossRefGoogle Scholar
  8. 8.
    Jain, A., Jain, A., Chauhan, N., Singh, V., Thakur, N.: Information retrieval using Cosine and Jaccard similarity measures in vector space model. Int. J. Comput. Appl. 164, 28–30 (2017)Google Scholar
  9. 9.
    Oetzel, M.C., Spiekermann, S.: Systematic methodology for privacy impact assessments. Eur. J. Inf. Syst. 23(2), 126–150 (2014).
  10. 10.
    Qimin, C., Qiao, G., Yongliang, W., Xianghua, W.: Text clustering using VSM with feature clusters. Neural Comput. Appl. 26(4), 995–1003 (2015).
  11. 11.
    Wright, D.: Making privacy impact assessment more effective. Inf. Soc. 29(5), 307–315 (2013).
  12. 12.
    Wright, D., Finn, R., Rodrigues, R.: A comparative analysis of privacy impact assessment in six countries. J. Contemp. Eur. Res. 9(1) (2013).

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Department of Information and Finance ManagementNational Taipei University of TechnologyTaipeiTaiwan, R.O.C.
  2. 2.Computer CenterHsin Sheng Junior College of Medical Care and ManagementTaoyuanTaiwan, R.O.C.
  3. 3.Telecommunication LaboratoriesChunghwa Telecom Co., Ltd.TaoyuanTaiwan, R.O.C.

Personalised recommendations