South Korea: in the midst of a privacy reform centered on data sharing

With rapid developments in genomic and digital technologies, genomic data sharing has become a key issue for the achievement of precision medicine in South Korea. The legal and administrative framework for data sharing and protection in this country is currently under intense scrutiny from national and international stakeholders. Policymakers are assessing the relevance of specific restrictions in national laws and guidelines for better alignment with international approaches. This manuscript will consider key issues in international genome data sharing in South Korea, including consent, privacy, security measures, compatible adequacy and oversight, and map out an approach to genomic data sharing that recognizes the importance of patient engagement and responsible use of data in South Korea.


Introduction
With rapid developments in genomic and digital technologies, genomic data sharing has become key to achieving precision medicine National Research Council 2011). One of the main shifts comes from the field of epigenetics that offers new opportunities for genomic research, disease prevention (Perkins et al. 2018) and precision medicine (Grossman et al. 2016). Following on these advances, South Korea made data sharing for precision medicine a priority in its political agenda (Korea Institute of S & T Evaluation and Planning 2017). Two decades ago, the initial biobanks in South Korea were population-based cohort studies. Initiated in 2008, the Korea Biobank Network (KBN) now includes 17 regional biobanks, and the National Biobank of Korea (NBK) systemically collects human bioresources. The NBK has collected, managed and distributed human biospecimens and data through the Biospecimen Information Management System (BIMS) (Korea National Institute of Health 2017). Today, the leading agency of the KBN and a control center of the NBK is the Korea Centers for Disease Control and Prevention (KCDC). It is currently planning a large-scale precision medicine databank (Chu 2017; Korea Institute of S & T Evaluation and Planning 2017).
Various genomic data sharing policies exist at the international level (Global Alliance for Genomics and Health 2014;OECD 2007;UNESCO 2003). South Korea now also stands in a position to utilize its abundant national data resources in healthcare and join international research collaborations. Clinical data are deposited in the electronic medical records (EMRs) in 92% of hospitals (Korea Ministry of Health and Welfare 2016) and the records of hospitalization and medical claims are transmitted to the National Health Insurance Corporation and the Health Insurance Review and Assessment Service. Moreover, there is a multi-ministerial effort to integrate public health data with other public data, such as geospatial data collected by the Ministry of Land, Infrastructure and Transportation or Statistics Korea; environmental and satellite data from the Meteorological Administration or from the Ministry of Environment; population census, household income and expenditure survey data from Statistics Korea; as well as birth-and death-related data from the Ministry of the Interior and Safety (Kim et al. 2017;Gang 2016). Through this national initiative, the government aims to enable high accessibility and use of data among South Korean researchers (Chu 2017) in a similar fashion as the UK Biobank (Sudlow et al. 2015), the Canadian Partnership for Tomorrow Project (Drummer et al. 2018) and the All of Us research program in the U.S. (Devaney 2017).
There are also growing interconnections between the public and commercial domains associated with precision medicine. Since 2016, South Korea has permitted genetic testing outside of the clinical setting, including private Direct-to-Consumer Genetic Testing (DTC-GT) (Jeong 2017). DTC-GT regulations in the revision of the Bioethics and Safety Act of 2005 (BSA) (2017) apply to commercial genomic data sharing and public-private partnerships. Major hospitals also collaborate to develop their own big data for clinical research. For example, in 2015 the Asan Medical Center, one of the major private hospitals in South Korea, developed a research information system with more than 4 million registered patients that present a first model of big data research in hospitals (Shin et al. 2015).
The integration of real-life data generated from personal mobile and wearable devices to conduct research will enrich precision medicine. Patients and citizens can be considered as data providers. In that sense, individuals can share lifelogs and biometric information such as diet, sleep, physical activity and movement to any database through digital technologies. Given the potential of these new tools, active public engagement is becoming a crucial component to successfully increase participation in genomic research (Borry et al. 2018). Moreover, there is substantial evidence that active public engagement and sharing personal life data are overall beneficial to the individuals as digital health consumers (McMahon and Pan 2018;Parker et al. 2018;Willink and DuGoff 2018;Rathbone and Prescott 2017). Thus, the success of data-intensive models for precision medicine relies largely on public trust and positive anticipation of research benefits.
Emerging international trends in data-sharing initiatives have brought new opportunities along with ethical, legal and political challenges (Robinson et al. 2015). Data can be potentially re-identified, which has led to illicit data misuse cases (Phillips et al. 2017). Personal information stored in a database cannot be destroyed if it has been copied or shared with others. Indeed, there are concerns associated with data governance and data sharing, such as privacy, security and liability, as well as commercial use (Allen et al. 2014) raising the need to review Korean data governance policies regarding the different size, scope, complexity, scale and forms of genomic data sharing on the horizon. Accordingly, the South Korean government has consolidated its data protection legislation which will also apply to international genomic data sharing ( Fig. 1). Despite these legislative changes, the BSA and the Personal Information Protection Act of 2011 (PIPA) (2017) remain relevant and applicable. This manuscript considers key issues in international genomic data sharing in South Korea, including consent, privacy, security measures, compatible adequacy and oversight. It maps out an approach to genomic data sharing that recognizes the importance of both patient engagement and the responsible use of genomic data in South Korea.

Consent
The PIPA is a general law regarding the processing and protection of personal data (Art 1). According to the PIPA, a data subject is defined as "an individual who is identifiable by the information processed hereby becomes the subject of that information" (Art 2-3) and any party that processes personal information directly or indirectly to utilize the personal information is defined as a personal information controller, such as public institutions, corporations and a data user or a researcher (Art 2-5).
With regard to consent, a personal information controller bears certain responsibilities under the PIPA. A data controller should obtain the data subject's consent to the collection and use of personal information (Art 15). The PIPA requires that a data subject providing her/his personal information be notified by the data controller regarding the purpose of the collection and use of this information, the type of personal information being collected, the period of holding and usage of the personal information, and the right to refuse or withdraw consent. The data-sharing activities of a personal information controller are limited by the PIPA. A personal information controller must collect and identify the minimum personal information necessary to achieve the purpose of the collection (Art 16-1). The controller should then inform the data subject of the minimum personal information that is required so that the data subject may withhold her/his consent regarding the collection of information beyond this minimal dataset (Art 16-2). Additionally, the party collecting the information cannot refuse to provide goods or services to a person if he/she does not agree with the provision of information other than the required information (Art 16-3). However, it is challenging to clearly distinguish what constitutes the minimum information required from information that goes beyond the required information. Unfortunately, the PIPA does not provide additional guidance on this distinction. In other words, according to the PIPA, it is unclear which is the personal information that can be freely used by corporations and public institutions and which is the personal information that can only be used by obtaining the specific consent of data providers.
These principles apply consistently, but with different criteria, including the possibility for broad consent for biobanks and genomic researches, to parties collecting genetic resources under the BSA. In a biobank context, it is mandatory to obtain written consent from the biospecimen donor and the range of researchers and institutions party to the data sharing must be indicated in the consent form (Art 42-1-3). After a biobank collects biospecimens with such consent, no further consent is necessary for further genomic data sharing. The biobank is responsible for reviewing the research proposals submitted by third parties and decides whether to provide biospecimens or genomic data (Art 43-1). If a genomic researcher plans to collect biospecimens or genomic data directly from research participants, written consent should be obtained along with information plans concerning genomic data sharing (Art 37-1-4). The research proposal, including the consent form and a plan for genomic data sharing, should also be approved by an Institutional Bioethics Committee (IBC) (Art 36-1) and the researcher can only provide them to a biobank or to other researchers after receiving this approval (Art 38-1). E-consent is accepted as an alternative to written consent under the BSA (Art 16-1). For biobank projects, the specific purpose of genomic research only needs to be mentioned in the consent form when a biobank is established to carry out a specific genomic research project, such as research on pulmonary cancer or leukemia (Art 42-1-1). Generally, biobanks seek consent for further data and biospecimen use without mentioning a specific research purpose (BSA Enforcement Ordinance, Annex 41). This broad consent model is now acceptable under the BSA (Kim et al. 2017;You and Kim 2015).
Although broad consent is generally used in biobanking, there are some challenges when applying this type of consent to biobanking for precision medicine. Foremost, legal guidelines on consent to use collected clinical data and biological samples which are a part of medical care records are unclear. Article 21-2 in the Medical Service Act of 1962 (MSA) (2017) does not allow the release of a patient's medical record to a third party, including his/her spouse, lineal ascendant or descendant or sibling, by a medical provider, the head of a hospital or a person working at a hospital. This provision is applied to prevent any disclosure or transmission of a patient's data for research purposes to a third party without proper consent. Researchers can use clinical data only if they satisfy the requirements of consent present in both the PIPA and the BSA. According to the PIPA, a hospital can provide clinical data to a third party when patients have consented to provide the data (Art 17-1, 2) and an additional separate consent is needed for cross-border data sharing (Art 17-3). A researcher can also share anonymized clinical data with the research community if the initial research plan is approved by the IBC and if the patient agreed to this sharing on the initial consent form (BSA 18-1, 2). When the data are shared, a second approval by the IBC is needed. A hospital, as a personal information provider, can provide clinical data without consent only if the data are de-identifiable and used for statistics or academic research (Art 18-2-4). Although there is no further explanation provided in the BSA as to whether de-identification is the same concept as that of anonymization (Art 38-2), specific information on de-identification and anonymization is only provided in a governmental guideline (Office for Government Policy Coordination et al. 2016). The lack of certainty around this definition could mean that, even if a hospital follows the governmental guideline, it could still be found liable in case of a breach of the duty of confidentiality (Kwon et al. 2016).
There are ongoing discussions on the opportunity to provide consent exemptions to research using personal health data from public health databases. The major argument to support a consent exemption is that it is necessary to link personal data from public and private databases in precision medicine, but this type of linkage is impossible if the data research has to satisfy the PIPA and the BSA because this requires data to be de-identified which, in this context, should be interpreted as anonymized (Kwon et al. 2016 There is also a discussion on the suitability of a consent exemption for the usage of personal location information transmitted from mobile devices. In the Act on the Protection, Use, Etc. of Location Information of 2005, personal location information is defined as "the location information regarding a particular person including information combinable with other information to track the location of a particular person even though location information alone is not sufficient to identify the location of such person" (Art 2-2). Similar to the All of Us program in the U.S. (Rothstein 2017;Devaney 2017), the precision medicine database in South Korea plans to collect personal mobility data (Korea Institute of S & T Evaluation and Planning 2017). In principle, the Act requires that any location-based service provider obtain consent that indicates the designated recipient of the personal location information and the purpose of sharing this data with a third party (Art 19). Currently, consent is exempt where data are de-identified and provided for the purpose of statistics, academic research or market research to the third party by a location information provider, such as mobile carrier companies or mobile applications (Art 21-2). However, this seems illogical, for it is impossible to provide de-identified information in this case since personal location data contain dynamic information about time, place and movement, as well as sensitive information about close relatives, dietary style, religion, education, job, etc. and location data can also be combined with other data in academic research. Moreover, Article 21-2 should not cover this type of usage of combined data because of its potential to be reidentified in such large-scale public datasets (de Montioye et al. 2013;Rodriquez et al. 2013) and the inherent risk of re-identifiability of public biobanks linked with a geo-information system (Bovenberg et al. 2016). Arguably, consent exemption, even for academic research, should not apply and separate consent from the data subject should be required to collect and use location information, particularly because of the contentious use term of the 'de-identification' (Phillips and Knoppers 2016).
From those discussions, as an alternative to broad consent or exemption from consent, a dynamic consent process is currently also considered by stakeholders. Current discussions in consent are mainly to seek a balance between the protection of the interests and autonomy of data subjects and the optimization of the sustainability of databases (International Bioethics Committee of the UNESCO 2017; Ploug and Holm 2016; Kaye et al. 2015;Steinsbekk et al. 2013). A main concern with broad consent is that this model would not permit researchers to provide sufficient information about the further usage of individual data to research participants (Grady et al. 2016). Dynamic consent is expected to resolve this problem using IT to facilitate the accomplishment of specific consent objectives with respect to addressing research participants' autonomy but there are still limitations on this type of consent (Joly 2018).

Privacy
In the past, South Korea developed separate laws to regulate the use of personal data in the public and private sectors. The Act on Protection of Personal Information of 1995Information of (2008 applied to public institutions while the Act on Promotion of Information andCommunications Network Utilization andInformation Protection, Etc. of 1999 (2017) applied to the private sector. The PIPA was enacted in 2011 to integrate the two sectors. It emphasizes the right to information privacy by specifying that the right to self-determination of the data subject includes a bundle of rights such as "the right to be informed of the processing of personal information;" "the right to agree or not to the processing of personal information and to the scope of consent;" "the right to request confirmation of personal information processing;" "the right to request access to personal information;" "the right to request the [processor] to suspend, correct, erase and destruct personal information;" and "the right to claim damages that result from personal information processing" (Art 4).
Genetic information is considered "sensitive information" by the PIPA and the BSA. According to the PIPA, sensitive information is defined as "the information that is likely to infringe on the privacy of data provider noticeably, such as ideology, belief, admission to or withdrawal from a trade union or political party, political opinions, health, sexual life, etc." (Art 23-1). Accordingly, personal medical information is classified as sensitive information (Art 18) and thus requires a high level of protection. It is possible to process sensitive information only when the data processor obtains the consent for such data transfer in a separate consent form from the original consent to the processing of personal information, or where other statutes permit international processing of the data (Art 23-1-1, 2).
Likewise, genetic data included in the medical record are also classified as sensitive information and protected as such, like any other kind of medical information. So, the principle of confidentiality that prohibits the disclosure of personal information to third parties, including family members, without consent also applies to genomic data (MSA Art 19). Personal genetic information is included within the notion of sensitive information even if it is obtained from genetic testing for non-clinical or research purpose (BSA Art 23-1, BSA Enforcement Ordinance Art 18). A person who receives sensitive information from a personal information controller should, in principle, not use it or provide it to a third party (Art 23-1). However, there are some exceptions authorized by the PIPA. For example, public institutions can process personal genome data for the purpose of crime prevention and to provide personal data to foreign governments or international organizations in conformity with international agreements or conventions (Art 18-2). Furthermore, the PIPA allows the controller of sensitive personal information to use or share this information for secondary research purposes only if the data are de-identified (Art 18-2-4). There remains the remote possibility that big data could be re-identified and there is no concrete definition of re-identification in the South Korean regulations (Shin 2018). Current privacy approaches in academic research based on de-identification (PIPA Art 18-2-4) or anonymization (BSA 38-2) are confusing (Chung 2015). For example, genomic research is understood to be exempt from review by IBC when a biobank provides genome data that is re-identifiable only via matching with another biobank, or when the research is not related to a donor's specific genetic traits (BSA Enforcement Ordinance Art 33). Exemptions from IBC review applicable to de-identified genetic data do not really benefit research participants. They may provide them a false sense of security since the data remain possibly re-identifiable. Neither does this data de-identification also truly benefit longitudinal studies, since follow-up is only possible with identifiable or coded data (Kwon et al. 2016). Accordingly, it seems more important that the information provider agrees with the protection and processing of personal information by genetic testing companies or public institutions rather than attempting to pose an objective judgment on the potential identifiability of such data. A genetic testing institution must provide genetic test results to the legal representative of an individual if requested (BSA Art 52). If, following an investigation, a genetic testing institution is found to have violated Article 52, the Ministry of Health and Welfare can "revoke the designation or registration of a genetic testing institution or a permission granted to an institution" or order it "to completely or partially suspend its operation for a specified period not exceeding one year" (Art 56-1-2). No such investigation or order has been conducted so far.

Security measures
As the value of a database grows, the need for a robust security system to protect the personal information against loss, theft, data leakages, counterfeit, falsification and damages increases. Recently, accidents involving health data leakage by public agencies such as the NHI Corporation and the HIRA were revealed to South Korean society (Lee 2017). For example, a medical information programming company which provided medical recording services to claim national health insurance or medical care benefits for 7500 hospitals in South Korea illegally extracted patients' clinical and prescription data (about 750 million files from 44 million individuals) and the company sold the data to a multinational firm (KH 2015). These incidents can negatively influence public trust in the state as a data steward. Accordingly, the duty of providing security safeguards of the personal information processor has been reinforced through detailed provisions in the PIPA (PIPA Enforcement Decree Art 30).
According to the Public Notice of the Standard Measures to Ensure Safety for Personal Information (Standard Measures) (2017), the duty to safeguard implies that the security measures taken correspond to the category of personal information processed and the amount of personal information recorded (Standard Measures Art 3). The processor has the burden of proof to demonstrate that adequate levels of security safeguards were implemented (Standard Measures Art 3). Standards of security safeguards are divided into technical, administrative and physical measures according to the nature of security controls (Standard Measures Art 4). Specifically, administrative measures include management and supervision of the trustee dealing with personal information processing while technical measures include access control of personal information, checking of connection logs, mandatory inspections of uniquely identifying information and security programs; physical measures include provisions to control the import and export from storage facilities and devices and auxiliary storage media containing copies of personal information.
Among personal information, biological information, including genetic information, should be more strictly managed through a password or a personal identification number (PIN) (Standard Measures Art 7-1, 2). The personal information processor must store biometric (including genetic) information and encrypt it when transmitting through an information communication network or transmitting through an auxiliary storage medium.
Regardless of the presence of these security measures, information leakage remains a serious issue in governmental public health databases. The National Health Insurance Corporation imposed disciplinary sanctions on six employees in 2002 (Lee 2007), and 15 more employees have recently received disciplinary sanctions for illegally browsing personal information and data leakages that occurred from 2014 to 2017 (Lee 2017). The previously described incident involving the HIRA service is another case in point. Another problem is the insufficient level of protection for mobile devices and networks. In some precision medicine initiatives, personal biometric and real-life information of individuals are collected on mobile devices and transmitted to a database. However, the PIPA was designed to promote protection measures such as physical infringement measures, including blocking access to lost or stolen mobile devices, controlling access to personal information using open-wifi networks, and encryption of personal information. While smart phones generate and store individual data related to personalized biology, psychology, behavior and daily environment in real time (Arora et al. 2014), the PIPA does not have regulations on the authentication or data encryption applicable to these devices. Therefore, when someone other than the mobile device owner accesses the mobile data to collect information, uses it without permission or secretly sends inaccurate data, it is difficult to identify the mobile owner's information in the collection process or in a database. This lack of oversight may cause damage to public trust in databases, data users and researchers due to the risk of inaccurate analysis and results. Inaccurate predictive information could eventually be harmful for research participants, particularly in high-risk populations (Maier et al. 2018;Johnson and Ghlert 2014;Wade et al. 2013).

International data sharing: compatible processing/adequacy
Automated information processing allows instant access to genomic data crossing borders in terms of geographic space, making it easy and fast to deliver information that is needed by research collaborators worldwide. Therefore, the principle of the PIPA is to establish protection from infringement of personal information that is sent beyond national borders (Art 14). Concrete conditions include the need to obtain the consent from the data subject before transferring personal data to a third party overseas and to respect all applicable requirements set forth by the PIPA (Art 17-3) including the processing of personal information, the safeguard of personal information and guarantees of rights of data subjects.
The main difference between South Korea and the EU's GDPR is that there is no provision in this country to judge compatible adequacy in international transfers of personal data. There are no specific data sharing rules or distinctions based on the geographical origin of the third party. The BSA also has no concept of compatible adequacy in genomic data sharing. Regardless of whether national or international, once the genome data are consented to be provided to a third party, it can be provided following the decision of the researcher or biobank which intends to conduct the data sharing, and the approval of an ethics committee.
An adequacy assessment is generally required for third countries to process European data. It will be challenging for South Korean privacy law to integrate new European requirements. For instance, there is no clear emphasis on the right to be forgotten and on the right to object automated processing, including profiling, in South Korean law.
Moreover, privacy regulations applicable to data de-identification or anonymization will need to be changed to better align with international data sharing standards (Phillips and Knoppers 2016;Kwon et al. 2016). Pseudonymization is an example of a concept that is not introduced in the PIPA. In Article 4(5) of the GDPR, it is defined as "the processing of personal data in such a manner that they can no longer be attributed to a specific data subject without the use of additional information, with technical and organizational measures to ensure that they are not attributed to an identified or identifiable natural person."

Oversight
Although research with de-identified genomic data is considered low-risk research, ethics review for genome data sharing, including cross-border sharing with third parties, is mandated by the BSA Article 10. First, an ethics committee reviews and approves the content of the written consent provided by the research participants according to the legal and ethical requirements for genomic research or biobanks. The participant can indicate personal preference to provide data to a third party, with personal identifying information or not, in the consent form. Then, the biobank approves the data access request through the Distributive Review Committee (DRC) in the National Biobank of Korea under the Regulation on Operation and Management of the National Biobank of Korea (2018). For other regional biobanks, alternative ethics committees carry out ethics review (Kim et al. 2017). When an institution or a group submits a plan for the use of human material, the result of the IBC review and relevant documents are sent to the head of the KCDC, the DRC reviews the plan for the use of human material and assesses whether to provide the human material or not (Korea Center for Disease and Prevention 2018).
Internationally, the need to obtain approval to use controlled sensitive data from an access committee can result in time delays and additional costs for researchers (Joly et al. 2012). However, similarly to Canada, there is no provision enabling a one-stop, streamlined ethics or access review process for data sharing in South Korea. This means that, even if multiple institutions engage in collaborative research, it is mandatory to obtain approvals from each institution participating in the initiative.

Conclusion
In this article, we described the complex South Korean legal and political framework applicable to international genomic data sharing. The landscape is currently undergoing profound changes as diverse stakeholders are currently actively working to update the legal and administrative frameworks for data sharing and governance and common standards are being discussed and established at the local, regional and international levels. These activities contribute to the assessment of the relevance of specific restrictions in national laws and guidelines and to the development of international harmonized principles for responsible data sharing inspired by the latest development from international organizations such as Global Alliance for Genomics and Health (Knoppers 2014).
However, even though the South Korean government is striving to increase the sharing of health and genomic data for research and development, the privacy framework applicable to this information has, until now, developed slowly. Only a few rules are specifically designed for the regulation of collaborative research involving international data sharing. One important issue is that existing Korean regulations have a strong focus on data protection and insufficient consideration on facilitating data usage. This state of affairs reflects public distrust and concerns about large databases due to a series of data misuse incidents that affected Korean society. Furthermore, legislative efforts to facilitate responsible data sharing of genomic and health-related data will need to be implemented. Such regulations applicable to cross-border data transfers will need to achieve a delicate balance between the promotion of data sharing, the improvement of mutual compatibility with western countries, and necessary restrictions to promote better accountability of all stakeholders in the data usage chain.