Abstract
We present a computer program named Datafly that maintains anonymity in medical data by automatically generalizing, substituting, inserting and removing information as appropriate without losing many of the details found within the data. Decisions are made at the field and record level at the time of database access, so the approach can be used on the fly in role-based security within an institution, and in batch mode for exporting data from an institution. Often organizations release and receive medical data with all explicit identifiers, such as name, address, phone number, and Social Security number, removed in the incorrect belief that patient confidentiality is maintained because the resulting data look anonymous; however, we show that in most of these cases, the remaining data can be used to re-identify individuals by linking or matching the data to other databases or by looking at unique characteristics found in the fields and records of the database itself. When these less apparent aspects are taken into account, each released record can be made to ambiguously map to many possible people, providing a level of anonymity which the user determines.
Chapter PDF
Similar content being viewed by others
References
Alexander, L. and Jabine, T. (1978) Access to social security microdata files for research and statistical purposes. Social Security Bulletin. 41 8.
Clayton, P., et al. (1997) Protecting electronic health information. National Research Co uncil. Washington, DC: National Academy Press.
Cooper, G. et al. (1997) An evaluation of machine- learning methods for predicting pneumonia mortality. Artificial Intelligence in Medicine 9, no. 2: 107–138.
Duncan, G. and Lambert, D. (1987) The risk of disclosure for microdata. Proceedings of the Bureaus of the Census Third Annual Research Conference. Washington: Bureau of the Census.
Duncan, G. and Mukherjee, S. (1991) Microdata disclosure limitation in statistical databases: query size and random sample query control. IEEE Symposium on Research in Security and Privacy. Oakland: IEEE2986: 278–287.
Grady, D. (1997) Hospital files as open book. The New York Times; New York, March 12, 1997: C8.
Hundepool, A. and Willenborg, L. (1996) p and Tau-argus: software for statistical disclosure control. Third International Seminar on Statistical Confidentiality. Bled.
Israel, R. et al. (1994) The international classification of diseases. Deaprtment of Health and Human Services Publication. (PHS) 941–260.
Lincoln, T. and Essin, D. (1992) The computer-based patient record: issues of organization, security and confidentiality. Database Security. Elsevier Science Publishers (IFIP) 1–19.
Kirkendall, N. et al. (1994) Report on statistical disclosure limitation methodology. Statistical Policy Working Paper. Washington: Office of Management and Budget, 22.
Kohane, I. (1994) Getting the data in: three-year experience with a pediatric electronic medical record system. In: Ozbolt J., ed. Proceedings, Symposium on Computer Applications in Medical Care. Washington, DC: Hanley & Belfus, Inc. 457–461.
Kohane, I., et al. (1996) Sharing electronic medical records across heterogeneous and competing institutions. In: Cimino, J., ed. Proceedings, American Medical Informatics Association. Washing-ton, DC: Hanley &Belfus, Inc, 608–612.
Linowes, D. and Spencer, R. (1990) Privacy: the workplace issue of the 80s. The John Marshall Law Review, 23, 591–620.
Skinner, C. and Holmes, D. (1992) Modeling population uniqueness. Proceedings of the International Seminar on Statistical Confidentiality. International Statistical Institute, 175–199.
Sweeney, L. (1996) Replacing personally-identifying information in medical records, the Scrub system. In: Cimino, J., ed. Proceedings, American Medical Informatics Association. Washington, DC: Hanley & Belfus, Inc, 1996: 333–337.
Sweeney, L. (1997) Weaving technology and policy together to maintain confidentiality. Journal of Law, Medicine and Ethics. Boston: American Association of Law, Medicine and Ethics, 25: 98–110.
Turn, R. (1990) Information privacy issues for the 1990s. IEEE Symposium on Research in Security and Privacy. Oakland: IEEE2884: 394–400.
Willenborg, L. and De Waal, T. (1996) Statistical disclosure control in practice. New York: Springer-Verlag.
Woodward, B. (1995) The computer-based patient record and confidentiality. The New England Journal of Medicine; Boston: Massachusetts Medical Society, 333 1419–1422.
Woodward, B. (1996) Patient privacy in a computerized world. 1997 Medical and Health Annual 1997. Chicago: Encyclopedia Britannica, Inc. 256–259.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 IFIP
About this chapter
Cite this chapter
Sweeney, L. (1998). Datafly: a system for providing anonymity in medical data. In: Lin, T.Y., Qian, S. (eds) Database Security XI. IFIP Advances in Information and Communication Technology. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-35285-5_22
Download citation
DOI: https://doi.org/10.1007/978-0-387-35285-5_22
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-5041-2914-5
Online ISBN: 978-0-387-35285-5
eBook Packages: Springer Book Archive