Keywords

1 Introduction

Primary objectives of public health agencies are to protect and promote the health of citizens through population-based strategies [1]. Public health actions include legal measures such as mandated use of seatbelt and motorcycle helmets, taxation of unhealthy food and mandatory reporting, testing and treatment of serious communicable diseases. Voluntary measures include preventive programs such as vaccination, educational campaigns for promoting healthy lifestyle, and nutrition supplementation to economically disadvantaged populations. Because health and health-related behaviors of populations can be highly influenced by environmental factors such as air pollution, green space, retail food environment, and hygiene, public health agencies are also responsible for improving the living environment [2]. An equally important role of public health is the surveillance of population health status, which encompasses spatiotemporal distribution of disease outcomes, behavioral risk factors, and physical and social environment affecting health. As a backbone of public health activities, surveillance provides critical insights in population health status to guide the development, management, and evaluation of interventions. As well, surveillance has a critical role in identifying of health needs among disadvantaged populations, thereby assisting the development of equitable health interventions.

The role of public health agencies considerably expanded from solely tracking and controlling of communicable diseases that were the primary cause of mortality up to the early twentieth century [3]. Noncommunicable diseases such as cancer, cardiovascular diseases, and diabetes, are now responsible for the majority of global death and disability [4], which placed increasing importance on the monitoring and preventive interventions for lifestyle risk factors such as diet, physical activities, and tobacco use. Socioeconomic and demographic inequality in health remains an important public health problem requiring the development and evaluation of equitable intervention reaching to underprivileged individuals. Other critical public health mandates include prevention and surveillance of injuries, assessment of healthcare performance, improvement of maternal and infant health, and disaster response. Furthermore, health of public is threatened by the resurgence of vaccine-preventable diseases [5, 6] and constantly emerging infectious diseases with significant national and global health consequence, including Severe Acute Respiratory Syndrome (SARS), West Nile Virus, the 2009 pandemic H1N1 influenza, and more recently the epidemic of Ebola hemorrhagic fever in the West Africa [7]. Ever-growing complexity and breadth of public health tasks call for innovative answers strengthening capacity of health agencies.

1.1 Ethical Challenges in Public Health Practice

Although individual liberties such as the right to privacy and autonomy represent a fundamental importance in our society, public health agency’s obligation to maximize community good and societal well-being inevitably conflicts with these values [8]. As an example, mandatory reporting and disclosure of personal information [9] in the name-based control of human immunodeficiency virus (HIV) imposed privacy threats and stigmatization among infected and heterosexual men [10, 11], whereas the control of tuberculosis (TB) often resulted in the use of police power to enforce isolation and detainment of diseased individuals and their contacts.

In addition, health promotion programs (e.g., promotion of healthy diet and physical exercise) can lead to victim blaming and stereotyping towards individuals with stigmatizing conditions such as obesity, treating the failure to comply healthy lifestyle as a matter of personal responsibility [12]. Policy intervention is invariably challenged by individualism and libertarian advocates [13], as exemplified by the opposing arguments against motor vehicle helmet law claiming excessive state’s police power and paternalism acted upon the liberty of the riders and discrimination in classifying them as a high-risk class, despite overwhelmingly positive evidence for the use of motor vehicle helmet in preventing death [14].

As for disease surveillance, public health agencies are exempted from routine ethics review and obtaining informed consent from patients to access to personally identifiable information that are collected for public and nonpublic health purposes, such as patient care and medical billing [9, 11, 15]. Although the exemption is necessary in a setting where immediate access to personal data and subsequent actions are paramount as in the control of certain communicable diseases such amendment was also extended to the monitoring of noncommunicable diseases, often sparking dispute in the limit of data access [16]. Breach of patient health information is a critical confidentiality problem considering the amount of personal data analysts, yet such incidence is surprisingly common among local health agencies [17, 18].

Taken together, public health intervention and practice are subject to a considerable moral challenge and longstanding dispute in seeking a trade-off between preservation of individual’s liberties and collective benefits of the population.

1.2 Big Data and Web 2.0 Technologies

The web and mobile applications supported by the Web 2.0 technologies led to unprecedented growth of personal information and communication footprints available in the Internet. The technologies and devices that surround modern lives continuously capture digital traces of daily activities including financial, environmental, and social interactions. The advances in “Big Data” [19] research has created a unique opportunity to process these networks of heterogeneous data to generate explanatory and predictive models [20], leading to field applications in commercial and political sectors to generate insights into and influence voter and consumer behaviors [21]. As an example, online browsing records are routinely used to monitor credit card fraud, develop individually targeted marketing, and create customer segmentation [21,22,23].

Big Data can be loosely explained by the following three dimensions: (i) Volume that explains how large is the datasets; (ii) Velocity that indicates how fast these large datasets are processed, and (iii) Variety, which refers to how different are these datasets in sources and formats [24]. Although defining Web 2.0 requires technical discussion [25], in the context of public health application, we characterize the term as a group of technologies allowing web applications/services to: (a) provide capabilities to upload and share contents by users; (b) harness collective intelligence extracted from the user contributed data; (c) provide dynamic and tailored contents matched to user’s profile or needs; and (d) the use web as platform, thus runs in wide range of devices. Some examples of Web 2.0 applications/services are search engines, blogs, multimedia sharing websites (e.g., YouTube), social networking services (SNS), and business review websites (e.g., Yelp).

Although the application of Big Data and Web 2.0 applications/services in public health activities is largely in its infancy, their potential in enhancing the capacity of surveillance and intervention led to a raised enthusiasm [26,27,28,29,30]. Shadowed by the excitement, however, ethical consequences of implementing these technologies received inadequate attention to date [31, 32]. As the earlier adoption in commercial sector revealed their privacy harming aspects and resulting public concerns towards the indiscriminate use of personal data by industries [21, 33], it is imperative to address the potential benefits on health of public and risks to individuals liberties of Big Data and Web 2.0 applications/technologies. Following section describes their opportunities of in advancing the science and practice of public health, followed by potential harms upon application to public health surveillance and intervention from the ethical perspective. Finally, we provide strategies to mitigate the risks in a hope to provide an initial step towards the development of legal safeguards and guidelines.

2 Opportunities of Big Data and Web 2.0 Technologies

The digital stream of personal, environmental, and social attributes can be gathered and analyzed at a large scale, whereas web applications and mobile devices provide means to reach a large number of individuals at relatively low cost [34].

The transformative features of digital/mobile and participatory technology include ability to assess citizen sentiment and deliver individually tailored (targeted) message at low cost [35], peer support for proactive and positive health-related behaviors among online community [36], timely access to citizen report at global scale through “crowdsourcing” [37, 38], and improved outreach to young and socioeconomically deprived individuals [39, 40]. In addition, the widespread use of digital devices (e.g., mobile phones) provides effective communication channel for reporting disease outbreaks by citizen sentinel network in low-resource settings [41]. In effect, participatory surveillance offers a potential to create a mutual collaboration between public health agencies and citizen sentinel who submits local information and in return receives relevant information to their context. This is contrary to the current use of online data by commercial industries, where the users’ data is routinely sold and/or analyzed by service providers (although aggregated or anonymized), and the profit from inferred/discovered knowledge is not shared with the Internet users, while the users suffer from privacy intrusion [23].

From the perspective of social justice, an equally important blessing of the mobile communication technology is their outreach to population subgroups that were previously unreceptive to or unreachable by traditional communication channels, for whom health disparity often exists [42, 43]. Finally, given the ongoing shift of citizen communication and information seeking to the digital environments [40], surveillance of online information and communication environment is likely to be an essential task to identify emerging public health challenges such as online advertisement of unhealthy products.

2.1 User Engagement and Empowerment

Traditional public health communication has been largely unidirectional and uniform. Health promotion, risk communication, and educational messages were typically disseminated through traditional media (e.g., television, radio, and printed media) without reflection of the individual context and needs [44]. The user centrality of the Web 2.0 applications/services has a potential to increase citizen engagement with public health messages through personalized and tailored contents [39, 44, 45]. Tailoring is particularly suited to convey clear and relevant information to those who are lacking sufficient health and technology literacy. In addition, interactive and synchronous nature of these technologies can promote bidirectional communications between public health organizations and users, potentially allowing health agencies to respond to user inquiries, share ideas, and encourage user-generated health contents [39] as well as facilitating peer supports and sharing of experience within the communities.

Furthermore, collaborative characteristic of the Internet can mitigate the effect of victim blaming due to stereotyping and labeling by media [46] by providing an environment for peer-supporting communities that can empower socially underprivileged individuals with stigmatized health conditions, such as obesity and HIV [47, 48]. As well, the openness of the Internet provides an opportunity for activism and resistance against “culture of stigma” or public ridicule towards these individuals [47, 49, 50].

2.2 Population Reach

In the United States, the digital divide in terms of access to the Internet is gradually closing across socioeconomic and demographic groups. In 2015, 74% of low-income individuals reported Internet usage, a 40% increase from 2000, and the gap in Internet adoption between high- and low-income groups shrunk from 47% in 2000 to 23% in 2015 [51]. Turning to education, 64% those without high school diploma used the Internet in 2015, relative to 19% in 2000, narrowing the gap from 59% in 2000 to 29% in 2015 [51]. Across age groups, Internet usage showed the steepest increase among seniors, climbing from 14% in 2000 to 58% in 2015. Driving this change is the widespread use of smartphones, which are owned by nearly two-thirds of adults in 2015 in developed nations, and undergoing rapid uptake in developing nations [52, 53]. In addition, the socioeconomic difference in the use of SNS appears to be absent in the U.S. [54], which suggests the equitable quality of health promotion through this channel.

Combined with the anonymity of the Internet, the wide population outreach is particularly effective in addressing sensitive and stigmatized topics among hard-to-reach individuals, such as men who have sex with men, older adolescents and young adults, and those affected by mental illness [40, 55]. As an example, the Internet has become a popular venue to find sexual partners among high-risk individuals for HIV infection who are unlikely to seek medical attention due to the lack of information on testing or feeling of shame, or stigmatization. Anonymous and voluntary counseling services within partner-finding websites greatly lowered barrier to clinic visits [56]. Socioeconomically disadvantaged and stigmatized individuals now have a greater opportunity to seek and access relevant health information and resources.

2.3 Online Surveillance

Although the Internet is becoming a primary media for health information seeking and communication among youth [55, 57], it is a highly unregulated environment filled with information lacking credibility and legitimacy and became an important channel for marketing of unhealthy products (and thus behaviors) created by industries. For instance, discussion of alcohol consumption in youth SNS communities is manipulated towards pro-drinking by alcohol industries [58]. Twitter is an important tool for marketing e-cigarettes, with overwhelmingly positive messages generated for commercial intent [40, 59, 60]. Industries are far ahead of realizing the value of social media in pursuing consumer behaviors, with established tactics such as creating online “viral” messages [62].

Governments cannot freely regulate the contents of the Internet unless information pose a clear threat to the health of public. However, continuous monitoring of online health information and its potential impact on health-related behavior is necessary. Given the growing importance of the Internet as a source of health information, facilitating access to credible information will be beneficial to the public. This is particularly relevant to individuals suffering chronic health conditions who may desperately seek expert advice [63, 64]. Finally, social media is an important venue to identify a new generation of public health issues including online bullying, sexual solicitation, and depression from SNS use [40].

3 Ethical Challenges in the Use of Big Data and the Web 2.0 Technologies

Lack of appropriate regulation on collection and analysis of personal data by industry and government surveillance resulted in public concerns about the loss of privacy and autonomy and profiling of citizens [21, 65]. Tracking online customer activities (often secretly to users) became industry-wide practice. Personally, identifiable data is routinely collected, centralized, and permanently stored by search engines and social networking sites (SNS) and undergoes mass disclosure (i.e., sold to their parties) without users’ permission and even awareness [66]. Such data is in turn used to predict personally sensitive attributes including sexual orientation, political view, parental separation [21, 67] and to generate personalized advertisement aimed at driving consumer preference and decision [68]. In addition, widespread sharing of personal information on the Internet poses a unique challenge in defining online privacy [69, 70]. Users self-disclosing personal information in SNS or blogs typically expect their messages to be exposed to the online world. However, it does not imply that the data is subject to amassing and indiscriminate use, as such activities can lead to reidentification and potential harm to the users [70], which certainly conflict with the users’ expectation [69]. Individuals who are affected by stigmatizing illness particularly express concerns with the unintended use of their data by third parties [71]. Despite the public nature of social media, privacy does exist as a right to prevent data usage that leads to stigmatization, restriction of liberty, and violation of privacy.

Some users publish their health and non-health information openly due to the sheer lack of privacy concerns and awareness. Such users are unlikely to be familiar with the contents of the privacy policies and statements in social media, which may state the usage of personal information by third parties. Even if users attempt to read privacy statements, inconsistency and complexity of privacy settings and policies in ever-growing social media applications make controlling of personal information far from trivial [72,73,74]. Yet, the legitimacy of these policies without users’ explicit understanding in terms of risks and benefits is highly questionable. Therefore, sharing of sensitive information in the online environment does not necessarily equate to consent for data usage. Furthermore, it is not clear whether the data usage policies in these websites encompass public health surveillance. Although public health agencies are exempt from obtaining consent for secondary use of patient health records, such action is justifiable only in the presence of clear and immediate public health risk [11].

3.1 Profiling and Predictive Analytics

Predictive analysis can be performed at both individual and community levels for predicting of future health-related attributes (e.g., diseases, prognosis, behaviors, and resource utilization), or inferring unobserved health-related attributes status as in disease diagnosis. Rapid increase and linking of personal, social and environmental information from a large number of individuals in the Internet provide a sufficient set of predictive features for an accurate inference of sensitive personal attributes [75] such as body mass index, depression, emotion, sexual orientation, personality, and various behaviors [76,77,78].

Because predictive features are not directly collected from individuals but often accumulated in organizations running such algorithms, the targeted individuals have little control over the use of their information against the prediction of their personal or community attributes. These predicted personal features can be highly sensitive and potentially used in ways to damage their liberty and reputation [75]. The indiscriminate use of predictive algorithms can open a potential avenue to a public health analog of “predictive policing”, where health agencies could target, contact, and impose restrictive interventions on individuals with heightened risk of developing highly pathogenic communicable disease or behavior (e.g., violence and illicit drug use) of public health significance.

Similar to medical screening, these algorithms can facilitate early identification and notification of potentially serious health condition(s) to facilitate effective prevention measures. However, “black box” algorithm-based diagnosis and risk prediction from online data sources are qualitatively distinct from screening. Unlike screening test which are typically performed with informed participation, predictive algorithms can secretly profile millions of social media users without awareness of those who are targeted. Accidental or intentional disclosure of their predicted disease status can affect socioeconomic status including employment and insurance, whereas prediction of highly personal characteristic such as sexual orientation can be intrinsically offending. Excessive use of predictive algorithms to generate a large repository of sensitive characteristics can be highly concerning. It is therefore challenging for individuals to escape from the privacy assault from prediction and resist presumptive activities and automatic profiling by the government [79, 80] especially in the face of automatic data collection by ubiquitous sensor devices (e.g., including mobile smart phones), with their often opt-out nature of data submission.

Application of predictive algorithm at a community or group level, in general, prevents privacy harm at individual level due to aggregation; however, profiling and ranking communities by future disease burden may lead to public discomfort and label high-risk communities (e.g., high prevalence of smoking, obesity, or alcohol consumption) as a potential burden for a society. Community health profiling based on traditional data sources, such as population health survey, has been a routine task of health department. However, prediction of future community health by a data-driven approach undermines the fundamental value of “right to know” for citizens, potentially causing distrust to health authorities. At worst, the fear of being labeled by such algorithms will lead privacy-sensitive individuals to avoid the use of social media, whereas those lacking privacy and technology literacy will give up all their personal attributes to the intrusive activity of predictive algorithms.

3.2 New Age of Digital Divide

Although diminishing steadily, digital divide impedes fair distribution of technological benefits throughout the society. Individuals lacking connectivity to the Internet may be protected from data gathering activities of government and industries; however, their needs and voices may be underrepresented in the online society, resulting in interventions not reflecting their socioeconomic and cultural needs and interests.

An emerging and equally important form of technological disparity is a gap in usage purpose and capability to make effective use of online content for informed health-related decision-making [45, 81, 82]. Even with the Internet access, those at the lower level of educational attainment are less likely to use the online contents for the enrichment of their socioeconomic resources (e.g., beneficial information to promote career, education, and social position) and health information seeking [83, 84]. One of the main factors driving this usage gap, or secondary-level digital divide, is a lack of technology literacy among individuals with low socioeconomic status [64, 85].

However, the usage gap appears to exist even if conditioned on technological literacy, suggesting the presence of motivational factor(s) [83]. Although such differential usage may have existed in traditional media including newspapers and televisions, the amount and diversity of information on the Internet are far greater, thereby resulting in a potentially larger inequality in terms of benefit from the Internet across social classes [83]. If the usage pattern of the Internet has a stronger reflection of existing socioeconomic status than the usage pattern of traditional media, simple reliance on technology-based interventions may lead to the exacerbation of socioeconomic disparity in health-related knowledge and behaviors and other gains from Internet-based interventions.

At the international level, borderless nature of digital data could inadvertently introduce inequity across nations as well when used in global health surveillance. For example, data generated in low and middle-income countries is often analyzed in research labs and public health organizations in developed nations with little or limited return to the population health in the countries of origin, who may be eventually discouraged to share information in the event of an international health crisis [86].

4 Discussions

Used effectively, researchers and public health agencies can harness the power of Big Data and Web 2.0 technologies to meet the expanding responsibility of public health and improve the community and individual health. If, however, their ethical challenges remain unaddressed, public distrust and opposition can prevent the utilization of potentially useful technologies and dataset, thereby limiting the capacity of public health agencies in the future. One of the challenges lies in the speed of technological advancement and their potential applications that could far exceed the pace legislation can catch up. Nevertheless, there is an urgent need to discuss the development of tactics to facilitate their implementation into public health practice.

In this section, we explore key aspects of Big Data and Web 2.0 applications/services needed for (i) minimization of privacy harm; (ii) fair distribution of benefit; and (iii) transparency and user engagement.

4.1 Privacy and Confidentiality

Given the growing concerns over the intense data gathering activities by government and industries [87], public acceptance towards the application of social media and mobile technologies hinges to adequate assurance of privacy and confidentiality. Currently, existing local public health laws about the collection, storage, and sharing of electronic health data are inconsistent and incomplete, and typically require major review and reform to reflect the growing use of electronic personal information [18]. In addition, the challenge of defining privacy in online personal information makes the protection of privacy even more complex [88]. Attempts to seek privacy solution are scarce [71, 88], yet urgently needed.

4.1.1 Policy

Guidelines for security, privacy, and confidentiality concerning the use of personally identifiable information under public health purpose have been subject to discussion. Although these discussions do not specifically address the application of online personal information but more often limited to patient data generated and maintained by healthcare organizations, they are broadly applicable to the secondary use of personally identifiable information in the online environment. Specifically, the amount of personal information collected should be the minimum necessary to meet the well-defined goal of surveillance and the disease control program [89]. The openness of the Internet and rapidly lowering cost of data storage can create a temptation for large-scale collection, process, and centralization of personal information. Such practice, if not guided by clearly defined usage purpose, will introduce a predictable risk of reidentification [9].

Public release of findings from surveillance programs (i.e., reports) requires appropriate de-identification measures such as data aggregation to prevent reidentification and should be subject to automated or manual review prior to dissemination [90]. If findings are related to stigmatizing diseases, a prior consultation with affected communities is needed to prevent stigmatization and discrimination. Unfortunately, no de-identification method can assure complete anonymity as the linkage to a large number of external and publicly available data can lead to reidentification of individual [91, 92]. Therefore, the definition of anonymity needs to be treated as continuous scale [21], with the privacy risk of individuals carefully balanced against the health of population achieved by public health programs.

Moreover, limitation of the scale and scope of data collections and disclosure needs to be clearly specified even during public health crisis such as the emergence of highly pathogenic communicable diseases. The exemption from obtaining ethics review and patients’ consent to access personal health information opens up unlimited authority for data access and potentially results in an excess level of intrusive activities [93]. However, the totalitarian measures of disease control do not guarantee success, while public disapproval and outcry are certain, which potentially pushes back the utilization of the modern data science technology in public health.

4.1.2 Privacy by Design and Education

Confidentiality policies need to be accompanied by organizational practice, which can be achieved by training of employee who has access to data and the implementation of preventive technologies. Rigorous authentication and authorization protocols play a primary role in restricting inappropriate data release. These include multiple modes of authentication, the minimum necessary number of personnel with data access, physical restriction to workstations and servers, and periodic review of access privileges. Encryption of data transfer and communications is another necessary system-wide security measure, particularly when remote access to data sources is required. In addition, any personal information in the data center should be accessed by authorized individuals only on the need-to-know basis.

Most breaches of personal health information are unintentional and occur internally within health organizations as opposed to the external cause with the malicious purpose [18]. Effective prevention of data breach can be achieved by enhancing vigilance in handling personal information through confidentiality agreements with data analysists. They need to be properly informed the consequences of inappropriate disclosure including disciplinary actions. Furthermore, routine auditing of data usage will allow the detection of breach and malpractice as well as provide an enforcement and awareness for privacy-conscious activities.

4.1.3 Information Control

Whether one’s data is intended to be viewed by public or peers, assurance of information privacy requires that consumers have control over their personal attributes in terms of what information to share with whom for what purpose, in addition to receiving adequate information about privacy policy in the Web 2.0 applications/services. Ideally, these controls encompass the amount and type of information for location-based analysis, spatiotemporal context, and the recipients and users of such data as well as access to the result of analysis [88].

Unfortunately, low availability, readability (e.g., excessive text), and interpretability (e.g., legal terms) of policy statements even among commonly used applications deter the consumers and consequently leave them blind with privacy risks [66, 73]. In addition, there are common misconceptions about the policy in social networks, such as the lack of complete understanding in the “permanency” of recorded data and the false sense of security from reidentification among Twitter users [71]. As well, many users wrongly interpret the presence of a privacy policy as the assurance for the protection of their personal data, even though the policy statements do not always describe privacy protection [94]. Therefore, users often fail to set the appropriate level of disclosure, resulting in unintended disclosure of their own and others’ sensitive information [95, 96]. A clear privacy policy in terms of confidentiality rules, usage, and sharing of personal information by service providers and third parties is needed, and such information should be provided in the sixth-grade reading level [73]. In addition, standardization of terminology and formats of privacy statements should be implemented across providers/developers of health applications [97].

4.2 Population Reach and Engagement

Improvement in citizen engagement and population reach is crucial for effective dissemination of public health interventions and maintaining representative population in public health surveillances. Specifically, strong public engagement and collaboration can ensure accurate data quality and prevent loss to follow up in longitudinal data collection. On the other hand, wider population outreach facilitates the inclusion of underserved and hard-to-reach individuals.

Although the primary level of digital divide (i.e., lack of access to the Internet) may continue to decline due to the widespread use of mobile devices, the gap remains important for certain population subgroups, such as those who live in the rural area [45, 51]. Providing communication infrastructure and affordability of Internet access to these communities requires public/private collaboration and government subsidization [85]. In addition, lack of access among socioeconomically disadvantaged population persists in many developing nations [85], where providing shared connection at community access center may provide an only viable solution [98].

Narrowing the secondary level of digital divide (i.e., usage pattern of the Internet among those who have access) will at minimum require improvement of technology and health literacy, which may encourage active participation (e.g., sharing information, peer to peer communication, and seeking credible health-related information) [64, 85]. At the same time, the complexity and amount of information should be minimum necessary to meet the need for users with a minimum level of health literacy [45]. Because usage gap may also depend on factors beyond literacy (e.g., sociocultural preference) [83], further research is needed to explore such barriers and to identify approaches aimed at motivating Internet usage for the empowerment of health-related behaviors and information seeking.

Evaluation of behavioral change through the Web 2.0 applications/services has been relying on a small-scale randomized trial [34, 99]. However, study population from small trials do not adequately represent the heterogeneous needs of general population, particularly when measuring the performance of interventions among underserved and underprivileged population segments who may be less likely to participate into these studies due to the lack of access to the Internet or motivation for participation. Large-scale trials with the minimization of attrition or natural experiment will provide the best evidence needed to identify equitable quality and effectiveness of Internet-delivered interventions.

To date, the majority of research studies investigating the applications of social media have not been completely successful to exploit the “social” component; rather, researchers tended to treat SNS as a mere broadcasting tool that does not incorporate tailoring of messages meeting the diverse socioeconomic and cultural context of users [48, 99, 100]. Simply focusing on unidirectional communications ignoring the interactivity, tailoring, and peer engagement is likely to add little value over traditional communication channels. Efforts should be allocated towards leveraging the user-centric features of participatory applications to meet heterogeneous needs of population subgroups, and the tailored messaging should encourage informed decision-making for people with varying language, cultural background, disability, health and technological literacy and numeracy [101,102,103]. With the media-rich feature of the Internet, messages can be conveyed by the combination of images and video for the latter users for enhanced uptake [45].

Properly crafted health message and engaging environment will be an effective instrument to empower individuals, especially those with stigmatizing illness, who are more likely to use the Internet to seek health information and peer support [49, 104]. However, given the fact that health promotion messages can be amplified and escalate to the public ridicule of individuals suffering from potentially stigmatizing conditions, contents should be delivered in such a way to minimize stereotyping, while facilitating the active participation and empowerment of these individuals [49].

4.3 Transparency

Public health is a societal effort requiring active citizen engagement, which demands transparency of public health practices and decision-making process. Transparency implies timely dissemination of information related to the process of data collection and analysis, purpose, risks (e.g., privacy issues), and benefits of surveillance [105]. For public health interventions, all levels of relevant decision process (i.e., planning, implementation, operation, evaluation, and public reporting) should be available to the public. Providing such information to disadvantaged individuals and communities who are often least likely to benefit from public health actions [71, 105] is particularly an important task towards increasing the transparency.

Informing community is especially important when using publicly available online data since (i) public perception of Big Data analytics may have been transformed from “creepy practice” to actual privacy threatening after the previous revelation of government surveillance and industry practice conducted largely unknown to public [65, 106]; (ii) inappropriate use of such data will result in more serious damage to privacy and autonomy of individuals due to a large population coverage of digital data; and (iii) surveillance programs and predictive algorithms processing unstructured online data often require complex analysis, which is beyond understanding of lay community members. The feeling of being monitored and profiled by a “black box algorithm” can cause serious distress among people.

One of the major benefits of public disclosure is to prevent illegitimate and unjustified use, manipulation, and distribution of personal information by health departments while facilitating the minimum necessary data collection and analysis for specific and intended purpose only [21]. Finally, in anticipation of a public health crisis (e.g., bioterrorism and emerging infectious agent of pandemic potential) and the resulting use of authoritative power for effective emergency response, it is critical to specify appropriate safeguards for individual rights and due process [107]. Despite unavoidable risk on privacy, patients and the general public would allow the secondary use of their data for surveillance; however, such trust is unlikely to be cultivated unless the transparency showing the rationale of the monitoring program is provided [61].

5 Conclusions

Successful application of Big Data will provide enhanced capabilities for disease surveillance and dissemination of public health interventions. Public health surveillance can also benefit from the data stream generated by a large number of sentinel citizen, and ubiquitous nature of Web 2.0 applications/services which enable increased population outreach. The interactive and collaborative characteristics of the Web 2.0 applications/services are likely to enhance citizen engagement with health interventions and encourage proactive contribution of user information to public health surveillance. In addition, the anonymity of the Internet provides a protection from stigmatization and stereotyping when communicating with public health agencies or among peers.

However, the utilization of these technologies will be justifiable only if accompanied with a clear ethical framework and guideline. Applications of Big Data are still in their infancy, and the impact on public health practice is yet to be revealed. However, it is crucial to explore the potential benefits and harms at the earliest opportunity to avoid repeating the lessons learned from industries and to maximize its utility for the public good.