The Personal Data Is Political

  • Bastian Greshake Tzovaras
  • Athina Tzovara
Open Access
Part of the Philosophical Studies Series book series (PSSP, volume 137)


The success of personalized medicine does not only rely on methodological advances but also on the availability of data to learn from. While the generation and sharing of large data sets is becoming increasingly easier, there is a remarkable lack of diversity within shared datasets, rendering any novel scientific findings directly applicable only to a small portion of the human population. Here, we are investigating two fields that have been majorly impacted by data sharing initiatives, neuroscience and genetics. Exploring the limitations that are a result of a lack of participant diversity, we propose that data sharing in itself is not enough to enable a global personalized medicine.


Genetics Personalized medicine Neuroscience Data sharing Diversity Open data Machine learning 

8.1 Introduction

Personalized or stratified medicine has been one of the hot topics in health care, reaching well beyond the launch of the Precision Medicine Initiative in the United States (Collins and Varmus 2010). The promise of personalized medicine is to identify individuals at risk and find optimally tailored health care solutions based on their genetic and environmental makeup (Lu et al. 2014). Although personal medicine spans over a variety of medical and biological disciplines, two subfields are particularly promising due to their growing adoption: genetics and neuroscience. Indeed, many current examples of precision medicine come from pharmacogenomics in general, specifically from oncology, where cancer treatments are picked to match the mutations found in tumours (Kummar et al. 2015; Smith 2012; Tan and Du 2012).

While this use of genetic data in health care is projected to become more central in the next years, its success will depend on multiple factors. As for most things in healthcare, cost plays a huge role. But while the costs for performing a high precision medical examination, like a brain scan, or sequencing a human genome continue to drop (Wetterstrand 2018), their usefulness is bound by both our ability to quickly process these large amounts of data as well as the lack of medically-relevant scientific knowledge we have about individual genetic variants (Dewey et al. 2014), or complex neurobiological processes. As such it is key that science be able to generate genetic knowledge more quickly (Kohane 2015).

Two recent trends in science, big data and artificial intelligence, appear to be promising for not only accelerating our genomic and neurobiological understanding but also for diagnosing in a precision medicine framework (Moon et al. 2007; Dilsizian and Siegel 2014). The idea is that artificial intelligence can be used to mine large data sets to find the smallest associations between genetic variants / neuromarkers and disease phenotypes, and to track disease progression or predict optimal treatments. To effectively create such large data collections it thus becomes central to link and share individual data sets (Kohane 2015). But while the total number of basepairs sequenced per time as well as the total number of participants included in neuroscientific studies have exponentially increased over the last years, sharing practices for such data has not kept up a similar speed (Kovalevskaya et al. 2016), despite individual efforts to enable open sharing of genetic (Mao et al. 2016; Greshake et al. 2014) or neuroscientific (Poline et al. 2012) data.

8.2 Sharing Genomic Data

To alleviate these shortcomings individual academic consortia have been founded to pool data sets across institutions and individual researchers. National efforts include the UK10K (“UK10K” 2018), which aimed to sequence 10,000 participants in the United Kingdom and the similarly structured 100,000 Genomes Project by Genomics England (“Genomics England” 2018). In the United States, the Exome Aggregation Consortium (ExAC) (“ExAC” 2018) – which has collected over 60,000 exomes - and more recently the All of Us initiative (“All of Us” 2018) are collecting and aggregating more patient data for research purposes. And it is not only academic research that is starting to collect large data sets for personalized medicine, commercial companies are starting to explore the field too.

Since deCODE Genetics and 23andMe released the first Direct-To-Consumer genetic tests back in 2007 (Vorhaus 2010), the market for commercial genetic testing has grown significantly: Not only in terms of companies like MyHeritage, FamilyTreeDNA, AncestryDNA or Veritas that have entered the market, but also in terms of the number of people who have gotten genetic tests through these services. Today, AncestryDNA has over five million customers and industry veteran 23andMe has genetic data for over two million people (McAllister 2017). These sizable commercial databases are of interest to academic and commercial researchers. 23andMe has collaborated with academic researchers on numerous research papers (“23andMe Research” 2018) and has done commercial for-profit collaborations with pharmaceutical companies like Pfizer and Genentech.

Who profits from such large-scale research remains open. As an example, in psychology the need to look into how representative study participants are has been acknowledged. After all, around 80% of all participants in psychology studies are from WEIRD (Western, Educated, Industrialized, Rich, Democratic) countries and do thus not represent human diversity (Henrich et al. 2010). As such, only WEIRD participants can fully profit from much of psychological research. To avoid the overrepresentation of WEIRD individuals found in psychology, it is key that our genetic research data resources reflect human diversity across populations. Indeed, this issue of representativeness becomes even more central in the genetic framework of Genome Wide Association Studies (GWAS). These studies are commonly used to inform personalized medicine by identifying genetic risk factors, e.g. for cancer (Agyeman and Ofori-Asenso 2015). Unfortunately, most of these identified risk factors are mere correlations, not genes directly causing a disease. As these correlations depend on the ancestry context in which they were found, findings of a GWAS are not necessarily applicable outside the human population in which an association was initially found (Bush et al. 2012) and cannot be replicated in many cases (Marigorta et al. 2013).

Indeed, many data sharing efforts show such a lack of population diversity: More than 50% of the over 60,000 samples in the ExAC consortium come from a European population (“ExAC” 2018). Similarly, commercial databases like the ones of 23andMe suffer from ancestry and race biases (“Problems with 23andMe Ancestry Composition” 2015; Euny Hong 2016). Open genomic databases – like the Personal Genome Projects and openSNP – are not fairing much better: 75% of participants in one of Harvard’s Personal Genome Project studies identified as white (Mao et al. 2016) and amongst a survey of over 500 openSNP participants over 70% come from the US, UK and Canada. Additionally, over 75% of openSNP participants had at least a Bachelor’s degree, hinting at a highly skewed demographic (Haeusermann et al. 2017).

8.3 Sharing Neurobiological Data

Similar to genetics, neuroscience has gone a long way when it comes to data sharing: While initial attempts to share data mainly focused on post-processed data, like coordinate-based results or statistical maps of magnetic resonance imaging (MRI) (Fox and Lancaster 2002), more recent initiatives enable sharing of entire functional or structural MRI datasets (Gorgolewski et al. 2015; Poldrack et al. 2013) and magneto- or electro- encephalography (M/EEG) data (Niso et al. 2016).

As in the case of psychology and genomics, neuroscience research is largely based on data of individuals from WEIRD societies (Falk et al. 2013), despite a plethora of studies showing that brain development is affected by socioeconomic status, early life stress, or cultural differences (Hackman et al. 2010; Marshall et al. 2018; Chan et al. 2018; Duval et al. 2017; Liddell and Jobson 2016). Indeed, within or across household socio-economic variables during childhood, such as family income, parental education (Ellwood-Lowe et al. 2018; Weissman et al. 2018) or neighbourhood poverty levels (Marshall et al. 2018), can be traced on trajectories of brain development, and result in differences in brain structure (Ellwood-Lowe et al. 2018) and cognitive functions (Hackman and Farah 2018), or gene expression (Parker et al. 2017). Differences in brain networks according to socio-economic status are also evident during adolescence (Weissman et al. 2018) and adulthood (Chan et al. 2018).

Furthermore, culture has been shown to influence neural functions (Liddell and Jobson 2016). Cultural and ethnic differences have an impact on emotion perception and expression, and brain responses to emotional or social cues (Derntl et al. 2012). Moreover, ethnic differences have been found in physiological responses to fear or novelty (Martínez et al. 2014; Kredlow et al. 2017), which are commonly used to assess anxiety or post-traumatic stress disorders (Bach et al. 2017). This situation is aggravated by the fact that ethnicity can influence skin conductance responses (Kredlow et al. 2017), which are commonly used as laboratory measurements of fear mechanisms (Tzovara et al. 2018), potentially leading to the exclusion of ethnicities despite being at higher risk e.g. for post-traumatic stress disorders (Roberts et al. 2011).

How much existing data sharing efforts for neuroscience are affected by these biases is hard to estimate at this point: Although these initiatives generally tend to support standardized data formats for data sharing (Niso et al. 2018; Gorgolewski et al. 2016), they only rarely include concrete guidelines for reporting of socio-demographic variables (Madan 2017).

8.4 Data Sharing as a Social Movement

All of this paints a bleak picture: The populations we are using to develop personalized medicine are highly WEIRD (Henrich et al. 2010). Even worse, we might often not even be aware of this, as we are not collecting the needed demographic data to identify our biases. Depending on the field, research studies can furthermore only contain small sample sizes, making it hard to evaluate how ethnicity or social factors influence neurobiological functions and gene expression. Only by sharing diverse datasets, and including rich demographic information will it be possible to make our understanding of disease progression, and neurobiological functions relevant for all individuals, irrespective of their social or ethnic background.

Back in 2005, Thomas Friedman firmly believed that next great breakthrough in bioscience could come from a 15-year-old who downloads the human genome in Egypt (Pink 2005). Today, we have to acknowledge that there is a good chance that this 15-year-old would not be able to profit from their own breakthrough. Because of this, we are still far away from a truly personalized medicine, making our personal data political. It is up to us, the generators of data and the people sharing data to work on changing this, ensuring that the promise of personalized medicine is equitable. Or to say it with Carol Hanisch’s words: There are no personal solutions at this time. There is only collective action for a collective solution (Hanisch 1969).


  1. “23andMe Research”. 2018.
  2. Agyeman, AkosuaAdom, and Richard Ofori-Asenso. 2015. Perspective: Does personalized medicine hold the future for medicine? Journal of Pharmacy and Bioallied Sciences 7 (3): 239. Scholar
  3. “All of Us”. 2018.
  4. Bach, D.R., A. Tzovara, and J. Vunder. 2017. Blocking human fear memory with the matrix metalloproteinase inhibitor doxycycline. Molecular Psychiatry 23: 1584–1589. Scholar
  5. Bush, William S., Jason H. Moore, J. Li, S.K. McDonnell, and K.G. Rabe. 2012. Chapter 11: Genome-wide association studies. PLoS Computational Biology 8 (12): e1002822. Scholar
  6. Chan, Micaela Y., Jinkyung Na, Phillip F. Agres, Neil K. Savalia, Denise C. Park, and Gagan S. Wig. 2018. Socioeconomic status moderates age-related differences in the Brain’s functional network organization and anatomy across the adult lifespan. Proceedings of the National Academy of Sciences of the United States of America 115: E5144–E5153. Scholar
  7. Collins, Francis S., and Harold Varmus. 2010. A new initiative on precision medicine. Perspective 363 (1): 1–3. Scholar
  8. Derntl, Birgit, Ute Habel, Simon Robinson, Christian Windischberger, Ilse Kryspin-Exner, Ruben C. Gur, and Ewald Moser. 2012. Culture but not gender modulates amygdala activation during explicit emotion recognition. BMC Neuroscience 13 (1): 54. Scholar
  9. Dewey, Frederick E., Megan E. Grove, Cuiping Pan, Benjamin A. Goldstein, Jonathan A. Bernstein, Hassan Chaib, Jason D. Merker, et al. 2014. Clinical interpretation and implications of whole-genome sequencing. JAMA – Journal of the American Medical Association 311 (10): 1035–1044. Scholar
  10. Dilsizian, Steven E., and Eliot L. Siegel. 2014. Artificial intelligence in medicine and cardiac imaging: Harnessing big data and advanced computing to provide personalized medical diagnosis and treatment. Current Cardiology Reports 16 (1).
  11. Duval, Elizabeth R., Sarah N. Garfinkel, James E. Swain, Gary W. Evans, Erika K. Blackburn, Mike Angstadt, Chandra S. Sripada, and Israel Liberzon. 2017. Childhood poverty is associated with altered hippocampal function and visuospatial memory in adulthood. Developmental Cognitive Neuroscience 23: 39–44. Scholar
  12. Ellwood-Lowe, Monica E., Kathryn L. Humphreys, Sarah J. Ordaz, M. Catalina Camacho, Matthew D. Sacchet, and Ian H. Gotlib. 2018. Time-varying effects of income on hippocampal volume trajectories in adolescent girls. Developmental Cognitive Neuroscience 30: 41–50. Scholar
  13. Euny Hong. 2016. 23andMe has a problem when it comes to ancestry reports for people of color. 2016.Google Scholar
  14. Falk, Emily, Hyde Luke, Colter Mitchel, Jessica Faul, and Et. Al. 2013. What is a representative brain? Neuroscience meets population science. PNAS 110 (44): 17615–17622.CrossRefGoogle Scholar
  15. Fox, Peter T., and Jack L. Lancaster. 2002. Mapping context and content: The BrainMap model. Nature Reviews Neuroscience 3 (4): 319–321. Scholar
  16. Genomics England. 2018.
  17. Gorgolewski, Krzysztof J., Gael Varoquaux, Gabriel Rivera, Yannick Schwarz, Satrajit S. Ghosh, Camille Maumet, Vanessa V. Sochat, et al. 2015. A web-based repository for collecting and sharing unthresholded statistical maps of the human brain. Frontiers in Neuroinformatics 9.
  18. Gorgolewski, Krzysztof J., Vince D. Tibor Auer, R. Cameron Craddock Calhoun, Samir Das, Eugene P. Duff, Guillaume Flandin, et al. 2016. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments. Scientific Data 3. Scholar
  19. Greshake, Bastian, Philipp E. Bayer, Helge Rausch, and Julia Reda. 2014. openSNP – A crowdsourced web resource for personal genomics. PLoS One 9 (3): e89204. Scholar
  20. Hackman, Daniel, and Farah Martha. 2008. Socioeconomic status and the developing brain. Trends in Cognitive Science 2009 Feb; 13 (2): 65–73. Scholar
  21. Hackman, D.A., M.J. Farah, and M.J. Meaney. 2010. Socioeconomic status and the brain: Mechanistic insights from human and animal research. Nature Reviews 11: 651–659.CrossRefGoogle Scholar
  22. Haeusermann, Tobias, Bastian Greshake, Alessandro Blasimme, Darja Irdam, Martin Richards, and Effy Vayena. 2017. Open sharing of genomic data: Who does it and why? PLoS One 12 (5): 1–15. Scholar
  23. Hanisch, Carol. 1969. The personal is political.
  24. Henrich, Joseph, Steven J. Heine, and Ara Norenzayan. 2010. The weirdest people in the world? Behavioral and Brain Sciences 33 (2–3): 61–83. Scholar
  25. Kohane, Isaac S. 2015. Ten things we have to do to achieve precision medicine. Science 349 (6243): 37–38. Scholar
  26. Kovalevskaya, Nadezda V., Charlotte Whicher, Timothy D. Richardson, Craig Smith, Jana Grajciarova, Xocas Cardama, José Moreira, Adrian Alexa, Amanda A. McMurray, and Fiona G.G. Nielsen. 2016. DNAdigest and Repositive: Connecting the world of genomic data. PLoS Biology 14 (3). Scholar
  27. Kredlow, Alexandra M., Suzanne L. Pineles, Sabra S. Inslicht, Marie France Marin, Mohammed R. Milad, Michael W. Otto, and Scott P. Orr. 2017. Assessment of skin conductance in African American and non-African American participants in studies of conditioned fear. Psychophysiology 54 (11): 1741–1754. Scholar
  28. Kummar, Shivaani, P. Mickey Williams, Chih-Jian Lih, Eric C. Polley, Alice P. Chen, Larry V. Rubinstein, Yingdong Zhao, Richard M. Simon, Barbara A. Conley, and James H. Doroshow. 2015. Application of molecular profiling in clinical trials for advanced metastatic cancers. JNCI-Journal of the National Cancer Institute 107 (4): djv003. Scholar
  29. Liddell, Belinda J., and Laura Jobson. 2016. The impact of cultural differences in self-representation on the neural substrates of posttraumatic stress disorder. European Journal of Psychotraumatology 1: 1–13. Scholar
  30. Lu, Y.-F., David B. Goldstein, Misha Angrist, and Gianpiero Cavalleri. 2014. Personalized medicine and human genetic diversity. Cold Spring Harbor Perspectives in Medicine 4 (9): a008581. Scholar
  31. Madan, Christopher R. 2017. Advances in studying brain morphology: The benefits of open-access data. Frontiers in Human Neuroscience 11.
  32. Mao, Qing, Serban Ciotlos, Rebecca Yu, Madeleine P. Zhang, Robert Chin Ball, Paolo Carnevali, Nina Barua, et al. 2016. The whole genome sequences and experimentally phased haplotypes of over 100 personal genomes. GigaScience 5 (1).
  33. Marigorta, Urko M., Arcadi Navarro, P.M. Visscher, M.A. Brown, M.I. McCarthy, J. Yang, L.A. Hindorff, et al. 2013. High trans-ethnic replicability of GWAS results implies common causal variants. PLoS Genetics 9 (6): e1003566. Scholar
  34. Marshall, Narcis A., Hilary A. Marusak, Kelsey J. Sala-Hamrick, Laura M. Crespo, Christine A. Rabinak, and Moriah E. Thomason. 2018. Socioeconomic disadvantage and altered corticostriatal circuitry in urban youth. Human Brain Mapping 39 (5): 1982–1994. Scholar
  35. Martínez, Karen G., José A. Franco-Chaves, Mohammed R. Milad, and Gregory J. Quirk. 2014. Ethnic differences in physiological responses to fear conditioned stimuli. PLoS One 9 (12). Scholar
  36. McAllister, Bryant F. 2017. Exponential growth of the AncestryDNA database. 2017.Google Scholar
  37. Moon, Hojin, Hongshik Ahn, Ralph L. Kodell, Songjoon Baek, Chien-ju Lin, and James J. Chen. 2007. Ensemble methods for classification of patients for personalized medicine with high-dimensional data. Moon 41: 197–207. Scholar
  38. Niso, Guiomar, Christine Rogers, Jeremy T. Moreau, Li Yuan Chen, Cecile Madjar, Samir Das, Elizabeth Bock, et al. 2016. OMEGA: The Open MEG Archive. NeuroImage 124: 1182–1187. Scholar
  39. Niso, Guiomar, Krzysztof J. Gorgolewski, Elizabeth Bock, Teon L. Brooks, Guillaume Flandin, Alexandre Gramfort, Richard N. Henson, et al. 2018. MEG-BIDS, the brain imaging data structure extended to magnetoencephalography. Scientific Data 5: 180110. Scholar
  40. Parker, Nadine, Angelita Pui-Yee Wong, Gabriel Leonard, Michel Perron, Bruce Pike, Louis Richer, Suzanne Veillette, Zdenka Pausova, and Tomas Paus. 2017. Income inequality, gene expression, and brain maturation during adolescence. Scientific Reports 7 (1): 7397. Scholar
  41. Pink, Daniel H. 2005. Why the world is flat. WIRED.
  42. Poldrack, Russell A., Deanna M. Barch, Jason P. Mitchell, Tor D. Wager, Anthony D. Wagner, Joseph T. Devlin, Chad Cumba, Oluwasanmi Koyejo, and Michael P. Milham. 2013. Toward open sharing of task-based fMRI data: The OpenfMRI project. Frontiers in Neuroinformatics 7.
  43. Poline, Jean-Baptiste, Janis L. Breeze, Satrajit Ghosh, Krzysztof Gorgolewski, Yaroslav Halchenko, Michael Hanke, Christian Haselgrove, et al. 2012. Data sharing in neuroimaging research. Frontiers in Neuroinformatics 6 (9).
  44. “Problems with 23andMe ancestry composition”. 2015.
  45. Roberts, A.L., S.E. Gilman, J. Breslau, N. Breslau, and K.C. Koenen. 2011. Race/ethnic differences in exposure to traumatic events, development of post-traumatic stress disorder, and treatment-seeking for post-traumatic stress disorder in the United States. Psychological Medicine 41 (1): 71–83. Scholar
  46. Smith, Richard. 2012. Stratified, personalised, or precision medicine. Thebmjopinion.Google Scholar
  47. Tan, Cong, and Xiang Du. 2012. KRAS mutation testing in metastatic colorectal Cancer. World Journal of Gastroenterology 18 (37): 5171–5180. Scholar
  48. Tzovara, Athina, Christoph W. Korn, and Dominik R. Bach. 2018. Human Pavlovian fear conditioning conforms to probabilistic learning. PLoS Computational Biology 14 (8): e1006243. Scholar
  49. “UK10K”. 2018.
  50. Vorhaus, Don. 2010. The past, present and future of DTC genetic testing regulation. Genomics Law Report.Google Scholar
  51. Weissman, David G., Rand D. Conger, Richard W. Robins, Paul D. Hastings, and Amanda E. Guyer. 2018. Income change alters default mode network connectivity for adolescents in poverty. Developmental Cognitive Neuroscience 30: 93–99. Scholar
  52. Wetterstrand, L.A. 2018. DNA sequencing costs, data from the NHGRI genome sequencing program.

Copyright information

© The Author(s) 2019

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Bastian Greshake Tzovaras
    • 1
    • 2
  • Athina Tzovara
    • 3
  1. 1.Lawrence Berkeley National LaboratoryBerkeleyUSA
  2. 2.Open Humans FoundationSanfordUSA
  3. 3.Helen Wills Neuroscience InstituteUniversity of CaliforniaBerkeleyUSA

Personalised recommendations