Skip to main content

A Tool for Analyzing Clinical Datasets as Blackbox

  • Conference paper
  • First Online:
Software Engineering in Health Care (SEHC 2014, FHIES 2014)

Abstract

We present a technique for the automatic identification of clinically-relevant patterns in medical datasets. To preserve patient privacy, we propose and implement the idea of treating medical dataset as a black box for both internal and external users of data. The proposed approach directly handles clinical data queries on a given medical dataset, unlike the conventional approach of relying on the data de-identification process. Our integrated toolkit combines software engineering technologies such as Java EE and RESTful web services, which allows exchanging medical data in an unidentifiable XML format and restricts users to computed information. Existing techniques could make it possible for an adversary to succeed in data re-identification attempts by applying advanced computational techniques; therefore, we disallow the use of retrospective processing of data. We validate our approach on an endoscopic reporting application based on openEHR and MST standards. The implemented prototype system can be used to query datasets by clinical researchers, governmental or non-governmental organizations in monitoring health care services to improve quality of care.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    De-identification process is defined as a technology to delete or remove the identifiable information such as name, and SSN from the released information, and suppress or generalize quasi-identifiers, such as zip code date of birth, to ensure that medical data is not re-identifiable (the reverse process of de-identification.).

  2. 2.

    http://gastros.codeplex.com.

  3. 3.

    http://www.privacyanalytics.ca/software/.

  4. 4.

    http://www.qpidhealth.com.

  5. 5.

    http://gastros.codeplex.com.

References

  1. Benitez, K., Malin, B.: Evaluating re-identification risks with respect to the hipaa privacy rule. JAMIA 17(2), 169–177 (2010)

    Google Scholar 

  2. Choi, C., Münch, R., Bunk, B., Barthelmes, J., Ebeling, C., Schomburg, D., Schobert, M., Jahn, D.: Combination of a data warehouse concept with web services for the establishment of the pseudomonas systems biology database systomonas. J. Integr. Bioinform. 4(1), 12–21 (2007)

    Google Scholar 

  3. Capitani di Vimercati, S., Foresti, S., Livraga, G., Samarati, P.: Protecting privacy in data release. In: Aldini, A., Gorrieri, R. (eds.) FOSAD 2011. LNCS, vol. 6858, pp. 1–34. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23082-0_1

    Chapter  Google Scholar 

  4. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006). doi:10.1007/11787006_1

    Chapter  Google Scholar 

  5. El Emam, K., Fineberg, A.: An overview of techniques for de-identifying personal health information. Access to Information and Privacy Division of Health Canada (2009)

    Google Scholar 

  6. Ferraiolo, D.F., Sandhu, R.S., Gavrila, S.I., Kuhn, D.R., Chandramouli, R.: Proposed NIST standard for role-based access control. ACM Trans. Inf. Syst. Secur. 4(3), 224–274 (2001)

    Article  Google Scholar 

  7. Garde, S., Hovenga, E.J.S., Buck, J., Knaup, P.: Ubiquitous information for ubiquitous computing: expressing clinical data sets with openEHR archetypes. In: MIE, pp. 215–220 (2006)

    Google Scholar 

  8. Kreger, H.: Web services conceptual architecture (WSCA 1.0). Technical report, IBM Software Group, May 2001

    Google Scholar 

  9. Liu, Z., Qamar, N., Qian, J.: A quantitative analysis of the performance and scalability of de-identification tools for medical data. In: Gibbons, J., MacCaull, W. (eds.) FHIES 2013. LNCS, vol. 8315, pp. 274–289. Springer, Heidelberg (2014). doi:10.1007/978-3-642-53956-5_18

    Chapter  Google Scholar 

  10. McDonald, C.J., Blevins, L., Dexter, P.R., Schadow, G., Hook, J., Abernathy, G., Dugan, T., Martin, A., Phillips, D.R., Davis, M.: Demonstration of the Indianapolis SPIN query tool for de-identified access to content of the Indiana network for patient care’s (a real RHIO) database. In: American Medical Informatics Association Annual Symposium (AMIA 2006), Washington, DC, USA, 11–15 November 2006 (2006)

    Google Scholar 

  11. Statistics Netherlands. u-argus user’s manual. http://neon.vb.cbs.nl/casc/Software/MuManual4.2.pdf

  12. Oster, S., Langella, S., Hastings, S., Ervin, D., Madduri, R.K., Phillips, J., Kurç, T.M., Siebenlist, F., Covitz, P.A., Shanbhag, K., Foster, I.T., Saltz, J.H.: Model formulation: cagrid 1.0: an enterprise grid infrastructure for biomedical research. JAMIA 15(2), 138–149 (2008)

    Google Scholar 

  13. Ping, X.-O., Chung, Y., Tseng, Y.-J., Liang, J.-D., Yang, P.-M., Huang, G.-T., Lai, F.: A web-based data-querying tool based on ontology-driven methodology and flowchart-based model. JMIR Med. Inform. 1(1), e2 (2013)

    Google Scholar 

  14. Prather, J.C., Lobach, D.F., Goodwin, L.K., Hales, J.W., Hage, M.L., Hammond, W.E.: Medical data mining: knowledge discovery in a clinical data warehouse. In: American Medical Informatics Association Annual Symposium (AMIA 1997), Nashville, TN, USA, 25–29 October 1997 (1997)

    Google Scholar 

  15. Price, M., Weber, J., McCallum, G.: Scoop - the social collaboratory for outcome oriented primary care. In: Proceedings of IEEE International Conference on Computer Based Medical Systems 27–29 May 2014 (2014)

    Google Scholar 

  16. Qamar, N., Faber, J., Ledru, Y., Liu, Z.: Automated reviewing of healthcare security policies. In: Weber, J., Perseil, I. (eds.) FHIES 2012. LNCS, vol. 7789, pp. 176–193. Springer, Heidelberg (2013). doi:10.1007/978-3-642-39088-3_12

    Chapter  Google Scholar 

  17. Qamar, N., Ledru, Y., Idani, A.: Validation of security-design models using Z. In: Qin, S., Qiu, Z. (eds.) ICFEM 2011. LNCS, vol. 6991, pp. 259–274. Springer, Heidelberg (2011). doi:10.1007/978-3-642-24559-6_19

    Chapter  Google Scholar 

  18. Samarati, P.: Protecting respondents’ identities in microdata release. IEEE Trans. Knowl. Data Eng. 13(6), 1010–1027 (2001)

    Article  Google Scholar 

  19. Sweeney, L.: Simple demographics often identify people uniquely, pp. 50–59. Carnegie Mellon University, Pittsburgh, Data Privacy Working Paper 3 (2000)

    Google Scholar 

  20. Templ, M.: Statistical disclosure control for microdata using the R-package sdcMicro. Trans. Data Priv. 1(2), 67–85 (2008)

    MathSciNet  Google Scholar 

  21. Xiao, X., Wang, G., Gehrke, J.: Interactive anonymization of sensitive data. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2009), pp. 1051–1054 (2009)

    Google Scholar 

Download references

Acknowledgments

The work presented in this paper was funded through National Science Foundation (NSF) TRUST (The Team for Research in Ubiquitous Secure Technology) Science and Technology Center Grant Number CCF-0424422. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NSF. This work has been partly supported by the project SAFEHR of Macao Science and Technology Development Fund (MSTDF) under grant 018/2011/AI.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nafees Qamar .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Qamar, N., Yang, Y., Nadas, A., Liu, Z., Sztipanovits, J. (2017). A Tool for Analyzing Clinical Datasets as Blackbox . In: Huhn, M., Williams, L. (eds) Software Engineering in Health Care. SEHC FHIES 2014 2014. Lecture Notes in Computer Science(), vol 9062. Springer, Cham. https://doi.org/10.1007/978-3-319-63194-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-63194-3_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-63193-6

  • Online ISBN: 978-3-319-63194-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics