A Tool for Analyzing Clinical Datasets as Blackbox

Qamar, Nafees; Yang, Yilong; Nadas, Andras; Liu, Zhiming; Sztipanovits, Janos

doi:10.1007/978-3-319-63194-3_15

Nafees Qamar¹⁵,
Yilong Yang¹⁶,
Andras Nadas¹⁵,
Zhiming Liu¹⁷ &
…
Janos Sztipanovits¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 9062))

Included in the following conference series:

610 Accesses

Abstract

We present a technique for the automatic identification of clinically-relevant patterns in medical datasets. To preserve patient privacy, we propose and implement the idea of treating medical dataset as a black box for both internal and external users of data. The proposed approach directly handles clinical data queries on a given medical dataset, unlike the conventional approach of relying on the data de-identification process. Our integrated toolkit combines software engineering technologies such as Java EE and RESTful web services, which allows exchanging medical data in an unidentifiable XML format and restricts users to computed information. Existing techniques could make it possible for an adversary to succeed in data re-identification attempts by applying advanced computational techniques; therefore, we disallow the use of retrospective processing of data. We validate our approach on an endoscopic reporting application based on openEHR and MST standards. The implemented prototype system can be used to query datasets by clinical researchers, governmental or non-governmental organizations in monitoring health care services to improve quality of care.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
De-identification process is defined as a technology to delete or remove the identifiable information such as name, and SSN from the released information, and suppress or generalize quasi-identifiers, such as zip code date of birth, to ensure that medical data is not re-identifiable (the reverse process of de-identification.).
2.
http://gastros.codeplex.com.
3.
http://www.privacyanalytics.ca/software/.
4.
http://www.qpidhealth.com.
5.
http://gastros.codeplex.com.

References

Benitez, K., Malin, B.: Evaluating re-identification risks with respect to the hipaa privacy rule. JAMIA 17(2), 169–177 (2010)
Google Scholar
Choi, C., Münch, R., Bunk, B., Barthelmes, J., Ebeling, C., Schomburg, D., Schobert, M., Jahn, D.: Combination of a data warehouse concept with web services for the establishment of the pseudomonas systems biology database systomonas. J. Integr. Bioinform. 4(1), 12–21 (2007)
Google Scholar
Capitani di Vimercati, S., Foresti, S., Livraga, G., Samarati, P.: Protecting privacy in data release. In: Aldini, A., Gorrieri, R. (eds.) FOSAD 2011. LNCS, vol. 6858, pp. 1–34. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23082-0_1
Chapter Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). doi:10.1007/11787006_1
Chapter Google Scholar
El Emam, K., Fineberg, A.: An overview of techniques for de-identifying personal health information. Access to Information and Privacy Division of Health Canada (2009)
Google Scholar
Ferraiolo, D.F., Sandhu, R.S., Gavrila, S.I., Kuhn, D.R., Chandramouli, R.: Proposed NIST standard for role-based access control. ACM Trans. Inf. Syst. Secur. 4(3), 224–274 (2001)
Article Google Scholar
Garde, S., Hovenga, E.J.S., Buck, J., Knaup, P.: Ubiquitous information for ubiquitous computing: expressing clinical data sets with openEHR archetypes. In: MIE, pp. 215–220 (2006)
Google Scholar
Kreger, H.: Web services conceptual architecture (WSCA 1.0). Technical report, IBM Software Group, May 2001
Google Scholar
Liu, Z., Qamar, N., Qian, J.: A quantitative analysis of the performance and scalability of de-identification tools for medical data. In: Gibbons, J., MacCaull, W. (eds.) FHIES 2013. LNCS, vol. 8315, pp. 274–289. Springer, Heidelberg (2014). doi:10.1007/978-3-642-53956-5_18
Chapter Google Scholar
McDonald, C.J., Blevins, L., Dexter, P.R., Schadow, G., Hook, J., Abernathy, G., Dugan, T., Martin, A., Phillips, D.R., Davis, M.: Demonstration of the Indianapolis SPIN query tool for de-identified access to content of the Indiana network for patient care’s (a real RHIO) database. In: American Medical Informatics Association Annual Symposium (AMIA 2006), Washington, DC, USA, 11–15 November 2006 (2006)
Google Scholar
Statistics Netherlands. u-argus user’s manual. http://neon.vb.cbs.nl/casc/Software/MuManual4.2.pdf
Oster, S., Langella, S., Hastings, S., Ervin, D., Madduri, R.K., Phillips, J., Kurç, T.M., Siebenlist, F., Covitz, P.A., Shanbhag, K., Foster, I.T., Saltz, J.H.: Model formulation: cagrid 1.0: an enterprise grid infrastructure for biomedical research. JAMIA 15(2), 138–149 (2008)
Google Scholar
Ping, X.-O., Chung, Y., Tseng, Y.-J., Liang, J.-D., Yang, P.-M., Huang, G.-T., Lai, F.: A web-based data-querying tool based on ontology-driven methodology and flowchart-based model. JMIR Med. Inform. 1(1), e2 (2013)
Google Scholar
Prather, J.C., Lobach, D.F., Goodwin, L.K., Hales, J.W., Hage, M.L., Hammond, W.E.: Medical data mining: knowledge discovery in a clinical data warehouse. In: American Medical Informatics Association Annual Symposium (AMIA 1997), Nashville, TN, USA, 25–29 October 1997 (1997)
Google Scholar
Price, M., Weber, J., McCallum, G.: Scoop - the social collaboratory for outcome oriented primary care. In: Proceedings of IEEE International Conference on Computer Based Medical Systems 27–29 May 2014 (2014)
Google Scholar
Qamar, N., Faber, J., Ledru, Y., Liu, Z.: Automated reviewing of healthcare security policies. In: Weber, J., Perseil, I. (eds.) FHIES 2012. LNCS, vol. 7789, pp. 176–193. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39088-3_12
Chapter Google Scholar
Qamar, N., Ledru, Y., Idani, A.: Validation of security-design models using Z. In: Qin, S., Qiu, Z. (eds.) ICFEM 2011. LNCS, vol. 6991, pp. 259–274. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24559-6_19
Chapter Google Scholar
Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)
Article Google Scholar
Sweeney, L.: Simple demographics often identify people uniquely, pp. 50–59. Carnegie Mellon University, Pittsburgh, Data Privacy Working Paper 3 (2000)
Google Scholar
Templ, M.: Statistical disclosure control for microdata using the R-package sdcMicro. Trans. Data Priv. 1(2), 67–85 (2008)
MathSciNet Google Scholar
Xiao, X., Wang, G., Gehrke, J.: Interactive anonymization of sensitive data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2009), pp. 1051–1054 (2009)
Google Scholar

Download references

Acknowledgments

The work presented in this paper was funded through National Science Foundation (NSF) TRUST (The Team for Research in Ubiquitous Secure Technology) Science and Technology Center Grant Number CCF-0424422. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NSF. This work has been partly supported by the project SAFEHR of Macao Science and Technology Development Fund (MSTDF) under grant 018/2011/AI.

Author information

Authors and Affiliations

Institute for Software Integrated Systems, Vanderbilt University, Nashville, USA
Nafees Qamar, Andras Nadas & Janos Sztipanovits
University of Macao, Zhuhai, China
Yilong Yang
Birmingham City University, Birmingham, UK
Zhiming Liu

Authors

Nafees Qamar
View author publications
You can also search for this author in PubMed Google Scholar
Yilong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Andras Nadas
View author publications
You can also search for this author in PubMed Google Scholar
Zhiming Liu
View author publications
You can also search for this author in PubMed Google Scholar
Janos Sztipanovits
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nafees Qamar .

Editor information

Editors and Affiliations

TU Clausthal, Clausthal-Zellerfeld, Germany
Michaela Huhn
North Carolina State University, Raleigh, North Carolina, USA
Laurie Williams

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qamar, N., Yang, Y., Nadas, A., Liu, Z., Sztipanovits, J. (2017). A Tool for Analyzing Clinical Datasets as Blackbox . In: Huhn, M., Williams, L. (eds) Software Engineering in Health Care. SEHC FHIES 2014 2014. Lecture Notes in Computer Science(), vol 9062. Springer, Cham. https://doi.org/10.1007/978-3-319-63194-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-63194-3_15
Published: 27 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-63193-6
Online ISBN: 978-3-319-63194-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics