Abstract
We present a technique for the automatic identification of clinically-relevant patterns in medical datasets. To preserve patient privacy, we propose and implement the idea of treating medical dataset as a black box for both internal and external users of data. The proposed approach directly handles clinical data queries on a given medical dataset, unlike the conventional approach of relying on the data de-identification process. Our integrated toolkit combines software engineering technologies such as Java EE and RESTful web services, which allows exchanging medical data in an unidentifiable XML format and restricts users to computed information. Existing techniques could make it possible for an adversary to succeed in data re-identification attempts by applying advanced computational techniques; therefore, we disallow the use of retrospective processing of data. We validate our approach on an endoscopic reporting application based on openEHR and MST standards. The implemented prototype system can be used to query datasets by clinical researchers, governmental or non-governmental organizations in monitoring health care services to improve quality of care.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
De-identification process is defined as a technology to delete or remove the identifiable information such as name, and SSN from the released information, and suppress or generalize quasi-identifiers, such as zip code date of birth, to ensure that medical data is not re-identifiable (the reverse process of de-identification.).
- 2.
- 3.
- 4.
- 5.
References
Benitez, K., Malin, B.: Evaluating re-identification risks with respect to the hipaa privacy rule. JAMIA 17(2), 169–177 (2010)
Choi, C., Münch, R., Bunk, B., Barthelmes, J., Ebeling, C., Schomburg, D., Schobert, M., Jahn, D.: Combination of a data warehouse concept with web services for the establishment of the pseudomonas systems biology database systomonas. J. Integr. Bioinform. 4(1), 12–21 (2007)
Capitani di Vimercati, S., Foresti, S., Livraga, G., Samarati, P.: Protecting privacy in data release. In: Aldini, A., Gorrieri, R. (eds.) FOSAD 2011. LNCS, vol. 6858, pp. 1–34. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23082-0_1
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). doi:10.1007/11787006_1
El Emam, K., Fineberg, A.: An overview of techniques for de-identifying personal health information. Access to Information and Privacy Division of Health Canada (2009)
Ferraiolo, D.F., Sandhu, R.S., Gavrila, S.I., Kuhn, D.R., Chandramouli, R.: Proposed NIST standard for role-based access control. ACM Trans. Inf. Syst. Secur. 4(3), 224–274 (2001)
Garde, S., Hovenga, E.J.S., Buck, J., Knaup, P.: Ubiquitous information for ubiquitous computing: expressing clinical data sets with openEHR archetypes. In: MIE, pp. 215–220 (2006)
Kreger, H.: Web services conceptual architecture (WSCA 1.0). Technical report, IBM Software Group, May 2001
Liu, Z., Qamar, N., Qian, J.: A quantitative analysis of the performance and scalability of de-identification tools for medical data. In: Gibbons, J., MacCaull, W. (eds.) FHIES 2013. LNCS, vol. 8315, pp. 274–289. Springer, Heidelberg (2014). doi:10.1007/978-3-642-53956-5_18
McDonald, C.J., Blevins, L., Dexter, P.R., Schadow, G., Hook, J., Abernathy, G., Dugan, T., Martin, A., Phillips, D.R., Davis, M.: Demonstration of the Indianapolis SPIN query tool for de-identified access to content of the Indiana network for patient care’s (a real RHIO) database. In: American Medical Informatics Association Annual Symposium (AMIA 2006), Washington, DC, USA, 11–15 November 2006 (2006)
Statistics Netherlands. u-argus user’s manual. http://neon.vb.cbs.nl/casc/Software/MuManual4.2.pdf
Oster, S., Langella, S., Hastings, S., Ervin, D., Madduri, R.K., Phillips, J., Kurç, T.M., Siebenlist, F., Covitz, P.A., Shanbhag, K., Foster, I.T., Saltz, J.H.: Model formulation: cagrid 1.0: an enterprise grid infrastructure for biomedical research. JAMIA 15(2), 138–149 (2008)
Ping, X.-O., Chung, Y., Tseng, Y.-J., Liang, J.-D., Yang, P.-M., Huang, G.-T., Lai, F.: A web-based data-querying tool based on ontology-driven methodology and flowchart-based model. JMIR Med. Inform. 1(1), e2 (2013)
Prather, J.C., Lobach, D.F., Goodwin, L.K., Hales, J.W., Hage, M.L., Hammond, W.E.: Medical data mining: knowledge discovery in a clinical data warehouse. In: American Medical Informatics Association Annual Symposium (AMIA 1997), Nashville, TN, USA, 25–29 October 1997 (1997)
Price, M., Weber, J., McCallum, G.: Scoop - the social collaboratory for outcome oriented primary care. In: Proceedings of IEEE International Conference on Computer Based Medical Systems 27–29 May 2014 (2014)
Qamar, N., Faber, J., Ledru, Y., Liu, Z.: Automated reviewing of healthcare security policies. In: Weber, J., Perseil, I. (eds.) FHIES 2012. LNCS, vol. 7789, pp. 176–193. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39088-3_12
Qamar, N., Ledru, Y., Idani, A.: Validation of security-design models using Z. In: Qin, S., Qiu, Z. (eds.) ICFEM 2011. LNCS, vol. 6991, pp. 259–274. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24559-6_19
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Sweeney, L.: Simple demographics often identify people uniquely, pp. 50–59. Carnegie Mellon University, Pittsburgh, Data Privacy Working Paper 3 (2000)
Templ, M.: Statistical disclosure control for microdata using the R-package sdcMicro. Trans. Data Priv. 1(2), 67–85 (2008)
Xiao, X., Wang, G., Gehrke, J.: Interactive anonymization of sensitive data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2009), pp. 1051–1054 (2009)
Acknowledgments
The work presented in this paper was funded through National Science Foundation (NSF) TRUST (The Team for Research in Ubiquitous Secure Technology) Science and Technology Center Grant Number CCF-0424422. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NSF. This work has been partly supported by the project SAFEHR of Macao Science and Technology Development Fund (MSTDF) under grant 018/2011/AI.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Qamar, N., Yang, Y., Nadas, A., Liu, Z., Sztipanovits, J. (2017). A Tool for Analyzing Clinical Datasets as Blackbox . In: Huhn, M., Williams, L. (eds) Software Engineering in Health Care. SEHC FHIES 2014 2014. Lecture Notes in Computer Science(), vol 9062. Springer, Cham. https://doi.org/10.1007/978-3-319-63194-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-63194-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63193-6
Online ISBN: 978-3-319-63194-3
eBook Packages: Computer ScienceComputer Science (R0)