Abstract
Electronic Health Record (EHR) is frequently used in Health Information Exchanges for fusing data of same patients for public health informatics through the demographic attributes. Fusing this information across multiple health care entities presents a two-fold complexity. First the privacy constraints are stringent regarding sharing of demographic information across organizations. This requires encrypting or hashing records for anonymity. Second, the fusion of anonymized data leads to problem of finding duplicate records and linking the incoming information accurately to the existing records. This paper presents a methodology to acquire health data by the office of any public health department while preserving the privacy, integrity and usefulness of the data. Our novel duplicate detection algorithm is based on a combination of cryptographic hashing and machine learning techniques for approximate linking of patients’ records by identifying duplicate and unique records. Experimental results on three different datasets show that our proposed methodology is capable of detecting duplicates based on encoded demographic data from EHR affectively. In addition the proposed methodology can potentially be applied for record matching in other domains with encoded data.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Menachemi, N., Collum, T.: Benefits and drawbacks of electronic health record systems. Risk Manag. Healthc. Policy 4, 47–55 (2011)
Blumenthal, D., Tavenner, M.: The ‘Meaningful Use’ regulation for electronic health records. N. Engl. J. Med. 363(6), 501–504 (2010)
Raghupathi, W., Raghupathi, V.: Big data analytics in healthcare: promise and potential. Heal. Inf. Sci. Syst. 2(1), 3 (2014)
Grande, D., Mitra, N., Shah, A., Wan, F., Asch, D.A.: Public preferences about secondary uses of electronic health information. JAMA Intern. Med. 173(19), 1798–1806 (2013)
Centers for Medicare & Medicaid Services: The Health Insurance Portability and Accountability Act of 1996 (HIPAA) (1996)
Information Commissioner: Data Protection Act 1998 Legal Guidance: a reference document for organisations and their advisers that provides a broad guide to the Act as a whole. Information Commissioner’s office, Cheshire (2009)
European Parliament and the Council of the European Union: Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data. Off. J. Eur. Union L281, 31–50 (1995)
Wang, X., Ling, J.: Multiple valued logic approach for matching patient records in multiple databases. J Biomed. Inf. 45(2), 224–230 (2012)
Bilenko, M., Mooney, R., Cohen, W., Ravikumar, P., Fienberg, S.: Adaptive name matching in information integration. IEEE Intell. Syst. 18(5), 16–23 (2003)
Hernández, M.A., Stolfo, S.J.: Real-world data is dirty: data cleansing and the merge/purge problem. Data Min. Knowl. Discov. 2(1), 9–37 (1998)
Elmagarmid, K., Member, S.: Duplicate record detection : a survey. IEEE Trans. Knowl. Data Eng. 19(1), 1–16 (2007)
Rehman, M., Esichaikul, V.: Duplicate record detection for database cleansing. In: 2009 Second International Conference on Machine Vision, pp. 333–338 (2009)
Sorkhabi, B., Derakhshi, M.R.F., Shahamfar, H.: An algorithm for detecting similar data in replicated databases using multi criteria decision making. In: 2009 Second International Conference on Environmental and Computer Science, pp. 199–203 (2009)
Zhang, J.: An efficient and effective duplication detection method in large database applications. In: 2010 Fourth International Conference on Network and System Security, pp. 494–501 (2010)
Herschel, M.: Efficient and effective duplicate detection in hierarchical data. IEEE Trans. Knowl. Data Eng. 25(5), 1028–1041 (2013)
Samiei, A., Naumann, F.: Cluster-based sorted neighborhood for efficient duplicate detection. In: 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), pp. 202–209 (2016)
Newcombe, H.B.: Record linking: the design of efficient systems for linking records into individual and family histories. Am. J. Hum. Genet. 19(3), 335–359 (1967)
Wandhekar, V., Mohanpurkar, A.: Proof of duplication detection in data by applying similarity strategies. In: 2015 International Conference on Information Processing (ICIP), pp. 429–434 (2015)
Ektefa, M., Ibrahim, H., Memar, S.: A Threshold-based Similarity Measure for Duplicate Detection, pp. 37–41 (2011)
Sweeney, L.: K-anonymity: a model for protecting privacy. Int. J. Uncertain Fuzziness Knowl. Based Syst. 10(5), 557–570 (2002)
Sweeney, L.: Achieving K-anonymity privacy protection using generalization and suppression. Int. J. Uncertain Fuzziness Knowl. Based Syst. 10(5), 571–588 (2002)
Wong, R.C.-W., Li, J., Fu, A.W.-C., Wang, K.: (a, K)-anonymity: an enhanced K-anonymity model for privacy preserving data publishing. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 754–759 (2006)
Loukides, G., Shao, J.: Capturing data usefulness and privacy protection in K-anonymisation. In: Proceedings of the 2007 ACM Symposium on Applied Computing, pp. 370–374 (2007)
Nergiz, M.E., Atzori, M., Clifton, C.: Hiding the presence of individuals from shared databases. In: Proceedings of the 2007 ACM SIGMOD International Conference on Management of Data, pp. 665–676 (2007)
Nergiz, M.E., Clifton, C.: Presence without complete world knowledge. IEEE Trans. Knowl. Data Eng. 22(6), 868–883 (2010)
Gkoulalas-Divanis, A., Loukides, G., Sun, J.: Publishing data from electronic health records while preserving privacy: a survey of algorithms. J. Biomed. Inform. 50(Supplement C), 4–19 (2014)
Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms, 2nd edn. McGraw-Hill Higher Education, New York (2001)
Handschuh, H.: SHA-0, SHA-1, SHA-2 (Secure Hash Algorithm). In: van Tilborg, H.C.A., Jajodia, S. (eds.) Encyclopedia of Cryptography and Security, 2nd edn., pp. 1190–1193. Springer, New York (2011)
Norouzi, M., Fleet, D.J., Salakhutdinov, R.R.: Hamming distance metric learning. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1061–1069. Curran Associates, Inc. (2012)
Wei, M., Sung, A.H., Cather, M.E.: Improving database quality through eliminating duplicate records. Data Sci. J. 5, 127–142 (2006)
Wright, D.: Telemedicine and developing countries - a report of Study Group 2 of the ITU Development Sector. J. Telemed. Telecare 4(Suppl 2), 1–85 (1998)
Winkler, W.E., Thibaudeau, Y.: An application of the Fellegi-Sunter model of record linkage to the 1990 U.S. decennial census. In: U.S. Decennial Census. Technical report, US Bureau of the Census (1987)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Khalique, F., Khan, S.A., Mubarak, Qua., Safdar, H. (2019). Decision Tree-Based Anonymized Electronic Health Record Fusion for Public Health Informatics. In: Arai, K., Kapoor, S., Bhatia, R. (eds) Intelligent Computing. SAI 2018. Advances in Intelligent Systems and Computing, vol 858. Springer, Cham. https://doi.org/10.1007/978-3-030-01174-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-01174-1_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01173-4
Online ISBN: 978-3-030-01174-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)