Sample management: a primary critical starting point for successful omics studies

Biological samples collected from cohort studies are widely utilized in molecular genetic studies and are typically stored long term for future applications, such as omics analyses. The extent of sample availability is determined by proper sample handling, and it is of primary importance for successful omics studies. However, questions on whether samples in long-term storage are properly available for omics experiments has been raised, because the quality and availability of such samples remain unknown until their actual utilization. In that perspective, several guidelines for proper sample management have been suggested. In addition, several researchers assessed how improper management damages sample using mock sample and suggested a set of requirements for sample handling. In this review, we present several considerations for sample handling eligible for omics studies. Focusing on birth cohorts, we describe the types of samples collected from which omics data were generated. This review ultimately aims to provide proper guidelines for sample handling for successful human omics studies.


Introduction
Omics is a combinational suffix consisting of two suffixes, '-ome' and '-ics', which mean 'totality' and 'discipline', respectively. Omics refers to a discipline focused on the entirety of respective biological levels such as genomics, epigenomics, or transcriptomics. It aims to figure out complex biological regulations in relevance to human health or disease states. Specifically, as omics levels interact with each other, the importance of multi-omics approaches is growing (Koh and Hwang 2019;Ling et al. 2021;Sohn et al. 2021;Yu et al. 2021).
Exposome describes the totality of environmental exposure and its effects on health during the lifetime (Wild 2005). It helps to comprehend how unrecognizable but persistent environmental exposure in normal life (Ha et al. 2021;Kim et al. 2021) slowly affects our health through alteration of molecular processes (Rim and Kim 2020). Exposome is usually studied through cohort studies targeting specific population. For example, birth cohorts recruit pregnant women and their offspring to observe maternal effects delivered to the foetuses: effects of maternal exposure to environmental toxicants on foetal development and lifelong health risks. Cohort studies essentially accompanies acquisitions of various resources, such as clinical information, questionnaire, and biological samples (Harati et al. 2019) and they are often stored long term for future applications.
Nucleic acids such as DNA and RNA isolated from biological samples are starting materials for molecular experiments. As properly managed samples yield intact materials from which reliable data are generated, sample handling strategies are the critical determinants for successful omics studies. However, in that omics-based exposome has been raised as an essential approach relatively recently, sample managements of various cohorts may not fulfil the requirements.

3
In this review, we introduce recommendations for acquiring optimal biological samples and derived molecular materials. We further discuss the possibilities of omics-based exposome studies through the examples of several birth cohorts worldwide. Ultimately, we provide practical suggestions for sample management, leading to successful human studies.

Prenatal exposome
Environmental exposure during lifetime starts from in utero, because maternal environmental exposure is transmitted to the foetus. The delivered toxicities could harm foetus from early development to lifelong health or disease outcomes.

Establishment of birth cohorts
One of the purposes of birth cohorts is to understand prenatal exposome. Mothers provide biological samples and information about their health and exposures. Upon child delivery, clinical information and biological samples of the newborns are collected. The resources can be used for exposome study. Especially, biological samples are used to measure exposure levels in the body and to prepare molecular materials for omics studies.

Biological sample preparation
Collection Typically, urine, blood, solid tissues, etc. are collected in the cohorts (Elliott and Peakman 2008). Urine having low amounts of nucleic acids is rarely used in preparing molecular materials. Rather, it is used to measure the amounts of environmental chemicals metabolized in the body. Blood and solid tissues contain large amounts of nucleic acids, and trace of environmental chemicals persisting in the body at the same time. In that tissue needs to be biopsied, whereas blood requires minimal invasion for acquisition, blood is easily accessible for normal population. Exceptionally, in birth cohorts, placental tissue can be obtained without harming the mothers or newborns, because the placental disc is naturally eliminated from the maternal body and is dissociated from the newborns. Cord blood and placental tissue are invaluable resources for understanding the prenatal exposome, as they retain evidences of material exchanges between the mother and foetus (Rísová 2019).

Preprocessing
Biological samples are usually deep-frozen for future use, and prior to freezing, they must be properly processed to maintain their integrity. As blood has fluidic characteristics, it is fragile to freezing: red blood cells burst by freezing and thawing and it damages sample integrity irreversibly. Therefore, blood needs to be treated cautiously before freezing. First, it could be separated into individual components such as peripheral blood mononuclear cells (PBMCs), which are considered to be optimal for omics studies in which the majority of nucleic acids are contained but inhibitors are removed. Second, blood could be collected in the tubes with specified intentions. For example, ethylenediaminetetraacetic acid (EDTA) tube and heparin tube prevent blood coagulation by chelating Ca 2+ , a main cofactor for coagulating proteins and by activating anti-thrombin, respectively. The former is eligible for measuring trace element, isolating DNA and occasionally for separating PBMCs. The latter is for chemistry test or isolating PBMC, but is not recommended for polymerase chain reaction (PCR) because heparin acts as an inhibitor for the application. For PBMC isolation, blood mixed with heparin is layered onto the cell separator such as Ficoll-Paque reagent, followed by centrifugation. PBMCs in the middle layer between the reagent and serum are harvested. Meanwhile, CPT tube contains a gel-formed cell separator to harvest PBMCs. Blood collected in the CPT tube is separated into layers by centrifugation. As the procedure for PBMC isolation is almost identical except that whether tube is embedding cell separator or not, researchers can determine tube to use depending on time, cost, or experimental efficiencies. According to our tentative estimation, the cost per separating 1 mL of blood using CPT tube is approximately 13.8 times more expensive than using a heparin tube with Ficoll. Lastly, PAXgene Blood RNA tube or Tempus Blood RNA tube contains fluidic reagent for preserving RNA. It purposely induces cell lysis and immediately inactivates RNA damaging factors such as RNases. Likewise, tissue must be washed off to remove impurities that can damage the samples. Although freezing is the gold standard for long-term sample storage, repetitive freezing and thawing highly increases the risk of sample damage. Therefore, samples should be distributed into the amount for a single-use such as by aliquoting blood or cutting tissues into several cryo-vials.
However, complying with the procedure often causes personal, temporal, spatial, and financial challenges (Harati et al. 2019), especially in birth cohorts where urgent delivery situations arise frequently. Furthermore, as deep-frozen samples are usually locked until their future applications, sample quality and suitability for omics studies are of suspicion. That is, the problems in utilizing biological samples for omics studies from pre-existing cohorts are being posed nowadays. In that matter, some researchers indirectly assessed the quality using mock samples independently collected. For example, Hebels et al. (2013) analysed the impact of handling and longterm storage on the suitability of blood-derived samples for transcriptomics, epigenomics, proteomics and metabolomics. In our previous study, we assessed degradation of RNAs isolated from frozen blood affected by collection tube type, time, temperature, and duration of long-term freezing (Koh et al. 2021a, b). Applicability of affected RNAs to transcriptomics was further determined through RNA sequencing.

Genomic deoxyribo nucleic acids (gDNAs)
Genomic and epigenomic studies use gDNAs as starting materials. In genomics, insertions, deletions, or substitutions of the sequences in certain locations are compared to reference genome or counterparts within the population (Bagaria et al. 2021). In epigenomics, patterns of methylation of cytosine bases are quantitatively assessed. As gDNAs are structurally stable, obtaining them with desirable qualities for downstream applications from frozen samples is not that intricate (Wolfe et al. 2014).

Ribo nucleic acids (RNAs)
RNAs are used to profile gene expression or examine posttranscriptional modifications, such as alternative splicing . Unlike gDNA, RNA is fragile by its unstable structure. Especially, freezing bursts blood cells by which RNA is degraded unless blood is specifically treated with an RNA stabilizer. According to our estimation, however, PAXgene tube or Tempus tube containing the stabilizer were approximately 50 times more expensive than the more commonly used EDTA tube. As a more economical alternative, using TRIzol, an RNA isolation reagent, instead of the specialized tubes has compatible efficiency on RNA preservation (Ma et al. 2010;Schwochow et al. 2012). What we should have in mind is that suggested reagents are designated for RNA study through chemical stimulus, but not for other types of molecular materials. Another suggestion is to isolate PBMC from fresh blood prior to freezing. As PBMCs are physically separated from various damaging factors, we insist that PBMCs are highly optimal for almost every types of molecular materials.
Like blood samples, tissue specimens also should be properly prepared. For example, RNase-rich placental tissue (Haimov-Kochman et al. 2006) is recommended to be rinsed with phosphate-buffered saline to remove maternal blood or other impurities and soaked in RNA stabilizers such as RNAlater (Wolfe et al. 2014). Unlike blood, as cells in the tissues cannot be separated, it is thought that to remove and inactivate inhibitors through proper washing and using stabilizers, respectively, are the optimal processes maintaining tissue integrity.

Chromatin
Histone modification is one of epigenomic alterations occurred in the histone protein without altering DNA sequences, and is involved in gene expression regulation. For example, tri-methylation of lysine 3 and lysine 27 in H3 protein activates and represses gene expression, respectively (Karlić et al. 2010). DNAs coiling the modified nucleosomes are selectively obtained by chromatin immunoprecipitation (ChIP) using specific antibodies for quantitative analyses. In that undamaged cells retain nucleosome structures, the majority of ChIP protocols suggest cells or tissues as starting materials. Moreover, ChIP accompanying considerable sample loss requires a high number of cells (10 6 -10 7 ) as starting amounts (Park 2009), which can be obtained from 5 to > 10 mL of blood (PBMC isolation methodology 1 2021; PBMC isolation methodology 2 2021). This is quite a large amount, considering < 500 μL of blood yields sufficient gDNA or RNA for downstream applications. However, human samples are finite, and many researchers are advancing ChIP methodologies using a lower volume of cells (Acevedo et al. 2007;Brind'Amour et al. 2015;Gilfillan et al. 2012).

Birth cohort profiles for omics studies
Many cohorts are aiming at prenatal exposome by adequately utilizing their resources. First, Human Early-Life Exposome (HELIX) project (Vrijheid et al. 2014) integrated six pre-existing birth cohorts in six European countries-United Kingdom, France, Spain, Lithuania, Norway, and Greece-and further established respective sub-cohorts to the original ones. The two types of cohorts in HELIX redeemed each other to realize multi-layer exposome: exposure models across the Europe were created from the original cohorts using their big data; and multi-omics data were generated from the sub-cohort using their biological samples (Vives-Usano et al. 2020). Second, Korea Exposome Centre (funded by the Korean Ministry of Environment from 2017 to 2021) aims to understand prenatal exposome through multi-omics analyses. Samsung Medical Centre, a member of the Korea Exposome Centre, established Growing Children's Health and Evaluation of Environment (GREEN) cohort. Using biological samples of mothers, newborns and followed-up children, the centre measured exposure levels Blood (mother, cord and child) Urine (child) Hair (mother and child) Nasal swab (child) of over 100 chemicals (Cha et al. 2021;Cho et al. 2019) and generated multi-omics data. GREEN cohort is considered to pioneer the prenatal exposome with wide and multi-layered approaches.
In addition to the representatives above, diverse birth cohorts collect and utilize biological samples for prenatal exposome (Table 1). Owing to easier accessibility, blood was acquired in almost every cohort. However, the extent of blood availability might differ according to their methodologies for processing blood. Owing to its structural stability, gDNA retains appropriate intactness in frozen samples unless they have been exposed to a specific heat, physical, or chemical stimulus, and this superiority of gDNA enabled many cohorts to obtain single nucleotide polymorphism (SNP) or DNA methylation data. For example, the Pregnancy and Childhood Epigenetics (PACE) consortium (Felix et al. 2018) gathered several cohorts having their own DNA methylation data generated by HumanMethylation450K array (Illumina, CA, USA). The consortium intended to understand DNA methylation patterns altered by prenatal exposure. In the early phase, they focused on the maternal smoking during pregnancy (Joubert et al. 2016) and gradually expanded their interests toward air pollution, maternal body mass index and maternal alcohol consumption. The grown up consortium is under active progress to identify significantly altered methylation patterns affected by prenatal exposures.

Conclusion
The completion of the Human Genome Project and introduction of high-throughput techniques have led us to understanding human phenomena in the molecular basis. Therefore, uncovering exposome in accompanying omics studies has been introduced as an advanced paradigm of risk assessment for environmental exposure. However, traditional approaches have been somewhat restricted to the statistical comparisons using exposure levels measured from the biological samples. That is, many cohorts which had been established before exposome concept were introduced have higher risks that samples qualities required for obtaining molecular materials are not properly considered. As the older cohorts reach time points to practical applications through prolonged resource collections and observations, sample limitations are being unveiled nowadays when exposome is thought to be an essential approach. Using unqualified samples fundamentally confounds reliable data interpretation, regardless of the precision of post-processes. Therefore, appropriate sample management is a critical starting point for successful omics studies. Fortunately, gDNAs have relatively easy accessibility that researchers can sufficiently exploit. But gaining RNAs or chromatin structures needs cautious access and pre-validation is highly recommended. In that perspective, we reviewed desirable sample handling criteria for successful omics studies, specifically focusing on prenatal exposome. But as the almost human phenomena are through molecular mechanisms, sample issues are not restricted to exposome or birth cohorts, but to diverse targets. Therefore, researchers planning to initiate human omics studies should establish solid pipelines for sample management in advance during initial study design, and we hope this review provides an insightful guideline for initiating and implementing successful studies.