Background

India is home to the world’s largest number of adolescents and young people (10–24 years old), comprising about a third (>400 million) of its population [1]. Nearly 20% of young people experience a mental health condition, bearing a disproportionately high burden of mental morbidity [2,3,4]. The Consortium on Vulnerability to Externalizing Disorders and Addictions (cVEDA) is a multi-site, international, collaborative, cohort study in India, setup to examine the interactions of environmental exposures, and genomic influences on neurodevelopmental trajectories and downstream vulnerability to psychopathology, with a specific focus on externalizing spectrum disorders. cVEDA spans seven recruitment centres, representing different geographical, physical and socio-cultural environments, across India. This paper presents the background, rationale and protocol of the cVEDA study.

The need for longitudinal studies using dimensional, multi-modal measures to study the etiopathological basis of psychiatric morbidity

Birth-cohort studies show that >70% psychiatric illnesses in adults begin before the age of 18 years [5]. Temperamental and psychological disturbances during childhood are prominent predictors of psychiatric disorders in adult life [6]. Childhood onset psychiatric disorders often extend into adulthood on a homotypic or heterotypic continuum [7]. Psychiatric morbidity is increasingly believed to be predicated upon a shared genetic and neurodevelopmental continuum, suggested by commonalities in clinical phenotype, neuroimaging characteristics, neuropsychological impairments, and environmental risk factors [8, 9]. Different psychiatric disorders rather than being discrete categories, possibly reflect differences in timing, severity and patterns of genetic expression and neurodevelopmental deviations [10]. Longitudinal studies over the developmental lifespan are ideally suited to study origins of psychiatric morbidity by tracking developmental trajectories, identifying deviations, and studying how deviations, and their interactions with genes and environment, relate to psychopathology [11,12,13]. Additionally, a combination of neuroimaging, neuropsychological, toxicological, and psychometric modalities better facilitates construction of models with high predictive power [14].

Role of the ‘exposome’ in determining psychiatric morbidity and the need to study it in low and middle-income countries

Psychiatric disorders have complex, multi-factorial, poly-gene-environmental etiologies [15]. Gene x environment correlation (rGE) and interaction (GxE) are two broad mechanisms that underlie this complex interface [16]. The exposome [17] includes general external environment (socio-economic, habitat), specific external environment (pollutants, infectious agents, substance use) and internal environment (physical activity, oxidative stress, etc.) [18]. Table 1 depicts the time-dependent impact of environmental exposures on developmental and psychopathological outcomes [19, 20].

Table 1 Developmental and psychopathological impact of environmental exposures over different life stages

The exposome, which can potentially modify genetic expression, is multi-cultural. Extrapolating findings from high-income settings to low and middle-income countries (LMIC) is problematic [47]. Ethnic backgrounds, in a diverse country like India [48], and the prevalence of exposures and outcomes pertinent to development and health vary by income group and geography [49,50,51]. Certain environmental risk factors (nutritional stress, environmental neurotoxins and culturally dependent forms of psychosocial stress) are largely specific to developing societies. Maternal malnutrition, suboptimal breast-feeding, childhood malnutrition, unsafe water, poor sanitation, indoor smoke, and high-risk behaviors are leading causes of death and disability-adjusted life years in LMIC [52, 53]. It is necessary to study the exposome, in diverse settings, transactionally and longitudinally, with attention to both distal and proximal influences [54] to understand its role in psychopathology/resilience [55]. Setting up longitudinal studies in LMIC, using measures comparable to existing studies can provide a more nuanced understanding of the gene-environment contributions to psychiatric morbidity.

The focus on externalizing disorders

Externalizing disorders are the third most prevalent class of mental disorders (after anxiety and depression) [56]. These include attention-deficit/hyperactivity disorder (ADHD), Oppositional defiant disorder (ODD), Conduct disorder (CD) in childhood, and adult ADHD, mood disorders, substance use disorders, impulse control disorders, emotionally-unstable personality disorder, and antisocial personality disorder in adulthood. They are associated with significant impairment, and health and non-health sector costs [57, 58]. A dimensional framework of “externalizing psychopathology” encompasses poor impulse control, poor attention allocation, heightened emotional reactivity, verbal and physical aggression, violation of rules, and substance abuse [59]. Individuals at ‘high risk’ for externalizing disorders have different patterns of brain activity, neuroadaptation, cognition, and externalizing temperamental traits [60,61,62,63,64,65,66]. A number of studies have documented variations in brain regional volumes [67], regional white matter integrity [68], functional blood flow characteristics [69, 70], social intelligence and corresponding functional brain activations [71], brain activation patterns during response inhibition tasks [72], differences in neurophysiological parameters like P300 and pre-exposure cognitive deficits [62]. Interestingly, network disruptions are incrementally related to externalizing symptoms and are proportional to the alcoholism family density [73]. These variations are also seen as intermediate phenotypes in individuals with externalizing disorders [74,75,76], suggesting that maturation delays, deficits and deviations predate disorder onset and hold promise as early identification markers of vulnerability. Preliminary investigations have also shown that young adults at ‘high risk’ could ‘catch-up’ on brain maturational differences, emphasizing the role of early interventions [77, 78]. The externalizing spectrum has complex, multi-factorial underpinnings (Fig. 1), strong links with sequential development of various disorders and plausibly a common inherited causality [5]. This makes them particularly interesting in the search for common genetic and neurodevelopmental vulnerabilities and moderating environmental influences.

Fig. 1
figure 1

Complex, multi-factorial underpinnings of externalizing disorders [60,61,62,63,64,65,66, 79,80,81]

Key concepts emerging from research on the etiopathological basis of psychiatric morbidity have highlighted – a genetic and neurodevelopmental continuum, a poly-gene-environmental etiopathogenesis, unique socio-cultural contexts as environmental determinants of psychopathology, variations in brain trajectories leading to different psychopathological outcomes, and the differential impacts of environmental stressors on developmental trajectories over various life-stages. It follows that studying disparate environmental influences, across the developmental lifespan, on genetically determined trajectories of multimodal brain endophenotypes, using dimensional, multi-modal measures in a longitudinal framework could uncover key etiopathological processes underlying psychiatric morbidity.

The cVEDA collaboration

The c-VEDA is a collaborative venture between researchers from India and the United Kingdom (UK), set up under a joint initiative on the aetiology and life-course of substance misuse and its relationship with mental illness, by the Medical Research Council, UK (MRC) and the Indian Council for Medical Research (ICMR). National Institute of Mental Health and Neurosciences, Bangalore (NIMHANS) and King’s College London (KCL) are the coordinating centres in India and the UK, respectively. Other participating centres from India include – (i) Post Graduate Institute of Medical Education and Research, Chandigarh (PGIMER), (ii) ICMR-Regional Occupational Health Centre (ROHC), Kolkata, (iii) Regional Institute of Medical Sciences, Imphal (RIMS), (iv) Holdsworth Memorial Hospital, Mysore (HMH), (v) Rishi Valley Rural Health Centre, Chittoor (RV), and vi) St. John’s Research Institute, Bangalore (SJRI). European collaborators include researchers who are part of major longitudinal imaging genetics studies – “Reinforcement-related behaviour in normal development and psychopathology” (IMAGEN) (https://imagen-europe.com), the “Avon Longitudinal Study of Parents and Children” (ALSPAC) (http://www.bristol.ac.uk/alspac/), and the “Study of Cognition, Adolescents and Mobile Phones” (SCAMP) (https://www.scampstudy.org).

Objectives of the cVEDA study

The cVEDA study is designed to (a) establish a cohort of about 10,000 individuals within specified age bands – 6–11, 12–17, 18–23 years; (b) Conduct detailed phenotypic characterization, with special emphasis on externalizing behaviors (temperament and disorders), in the cohort and parents of all participants; (c) Assess environmental exposures (psychosocial stressors, societal discrimination, nutrition and asset security, environmental toxins) thought to impact gene expression, brain development, temperaments and behaviors; (d) Establish a sustained and accessible data platform and a bio-resource with an integrated database to facilitate analyses and collaborations; and (e) Build research capacity by joint UK–India initiatives.

Methods/Design

Study design

The cVEDA is a cohort of individuals aged 6–23 years across 7 Indian sites. It draws upon existing research systems with well-established tracking and follow-up mechanisms, previously employed in studying the impacts of varied risk factors on non-communicable diseases [82,83,84,85]. Cohorts recruited at each centre are followed-up 1 and 2 years after baseline assessments. The study employs a planned missingness [86] (Fig. 2) approach in which participants are randomised to follow-up either 1 or 2 years after enrolment. Such a design permits three waves of data collection to be achieved while reducing the cost of measurement per wave. Since participants are randomly assigned to be present/missing at each follow-up, missing data are completely at random and hence parameters of interest can be estimated without bias. In addition, loss-to-follow up can be reduced as participants suffer less with study-fatigue and also fieldworkers can target their limited resources more effectively when encouraging participants to return.

Fig. 2
figure 2

Accelerated longitudinal with planned missingness design and the generation of developmental trajectories (latent growth model)

Since the primary interest of the study is in tracking development over almost the entire developmental lifespan there are cost/time benefits due to recruiting across a wide range of participant ages. Age-variability within-wave permits an accelerated cohort design [87] to be employed (Fig. 2). Here a given age-range can be spanned in a shorter period of time by considering the cohort as being comprised of multiple sub-cohorts each of a different age at recruitment. A variety of longitudinal statistical models, either within a Structural Equation Modelling or Multilevel Modelling framework, including latent growth models (aka mixed-effects models) can take full advantage of such data. For example, using at joint model one might examine the longitudinal interplay between alcohol use and antisocial behaviour through adolescence. In addition, these models, through their use of a maximum-likelihood approach to missing data, based on a Missing At Random assumption, can demonstrate a high level of statistical power for a fraction of the monetary and time costs of following all individuals for the whole time period. Such an analytical framework is also compatible with the planned-missingness aspect to the study design described above.

Timeline

The cVEDA study started in February 2016. After 9 months spent in study set up (staff recruitment, translations of study instruments into 7 Indian regional languages, setting up digital data capture platform, training of recruitment and assessment teams, and quality control exercises), recruitment started in October 2016. Recruitment, baseline assessments and randomized follow-up have been continued in parallel. Under the current funding cycle, we have completed recruitment and baseline assessments; and 1 and 2 year follow-ups on a part of the sample (Fig. 3).

Fig. 3
figure 3

cVEDA study timeline

Sample

Baseline Clinical Assessment: The cVEDA had a target baseline sample size of approximately 10,000 participants in three age bands: C1 (6–11 years), C2 (12–17 years) and C3 (18–23 years). As the objectives of the study were to carry out detailed phenotypic characterisation of participants, examine their environmental exposures and gene-environment interactions that may modulate brain development and affect externalizing and addictive behaviour patterns, the study sample needed to be as large as feasibly possible whilst being representative of the geographic, socioeconomic and cultural diversity of India. The target sample sizes at each recruitment site were decided based upon each site’s estimated capacity to recruit individuals over a 3-year recruitment period, given the site lead’s understanding about ground realities and experience from past studies. The sample represents five geographically, ethnically, and socio-culturally distinct regions; varied environmental risks: toxic exposures (coal-mines), slum-dwellers, socio-political conflict zones (insurgency and inter-ethnic violence); urban and rural areas; school and college attendees; and familial high risk (children of parents with substance use or other mental disorders), in order to have an adequate representation of individuals likely to convert to externalizing disorders.

The study followed non-probabilistic convenience sampling based on accessibility to potential participants in local schools, colleges, community and clinics. Exclusion criteria included legal blindness/deafness, seizure disorder active in the last 1 month, severe physical or active mental illness, refusal of consent, or inability to participate in follow-up assessments (e.g. due to migration). Individuals with specific contra-indications (metal implants, electrical devices, severe claustrophobia) were excluded from neuroimaging. Site-wise exposure characteristics and baseline sample sizes are depicted in Fig. 4. Whilst the recruitment approach was pragmatic, this type of sampling technique may not allow for results that can be generalised to the entire population. However, given the scope and breadth of exposures being assessed, this approach is useful for the target research objectives and may help generate new hypotheses for future studies [88].

Fig. 4
figure 4

cVEDA sample distribution and recruitment site characteristics

(Map of India source: http://mapsopensource.com/india-states-outline-map-black-and-white.html; As stated on the webpage “All the content by www.mapsopensource.com is licensed under a Creative Commons Attribution 3.0 Unported License”)

Biological samples and neuroimaging: Blood/buccal swab and urine samples were collected from participants at baseline. Around 15% of the baseline sample, i.e. consenting participants, underwent neuroimaging. Even at the risk of biased sampling, this strategy was adopted given the wide age range, high rate of refusal, especially in non-clinical populations, and the unavailability of research MRI scanners in 4 out of 7 sites.

Procedure

a) Phenotypic characterization

Assessments involved dimensional and categorical phenotypic characterization. The questionnaires and assessment protocols were translated (and back-translated using standard WHO protocols) from English into seven Indian languages (Hindi, Kannada, Telugu, Tamil, Manipuri, Bengali, Punjabi) for use across seven recruitment sites. Age-appropriate instruments are used to capture socio-demographic information, temperament, environmental exposures, parenting, psychiatric morbidity, and neuropsychological functioning. Table 2 details the domains of assessments, tools and protocols.

Table 2 cVEDA sample characterisation: Assessment domains, tools& protocols

Neuroimaging

Resting state fMRI (rsfMRI), Diffusion MRI (dMRI) and Structural MRI (sMRI) scans are done at baseline, and are being done for the randomized consenting participants at follow-up. Structural and rsfMRI are collected on 3 T scanners (Siemens, Germany; Philips, The Netherlands). To ensure comparability of image-acquisition techniques and ‘pool’ability of the multi-site MRI data, a set of parameters, particularly those directly affecting image contrast or signal-to-noise are held constant across sites (https://cveda.org/standard-operating-procedures/) (Table 2).

Blood/saliva samples for genetic studies

Blood samples (at least 10 ml, EDTA and Tempus tubes) are collected at baseline for DNA, RNA and plasma lead estimation, as per a Standard Operating Protocol (https://cveda.org/standard-operating-procedures/). Plasma, ‘buffy coat’ (white blood cells) and red blood cells, from centrifugation of blood samples in EDTA tubes, are transferred into labeled aliquots for storage at a central biobank. Tempus tubes and blood component aliquots are stored at − 80 °C. Samples are kept frozen at all times including during transport using temperature-controlled logistics. For participants who do not consent for a blood sample, or where it isn’t possible to obtain a blood sample (e.g. failure to identify a suitable vein for blood draw or an insufficient amount of blood sample), a buccal swab (for DNA) is taken.

Plasma and urine samples for toxicological studies

Estimation of lead in plasma, and arsenic, metabolites of tobacco (cotinine) and volatile organic compounds (VOCs), in urine samples, is incorporated in cVEDA as a measure of exposure to environmental neurotoxins. An aliquot of plasma isolated from blood samples is used for lead estimation. Lead is estimated in plasma as evidence suggest that plasma lead represents the toxicologically labile fraction of lead freely available to interact with target tissue rather than lead in whole blood [117, 118]. Mid-stream urine samples collected in sterilized and capped polythene bottles, and stored in deep freezers at −20 °C till analysis. Urine samples are analysed for total arsenic and metabolites of VOCs include trans, trans-muconic acid and s-phenyl mercapturic acid (benzene metabolites), hippuric acid (toluene metabolite), mandelic acid (ethylbenzene metabolite) and methylhippuric acid (xylene metabolite). Analytical methods for toxicological analysis are presented in Table 2. To validate the toxicological assessments in participants, environmental assessments of VOCs in ambient air and arsenic in water will also be carried out. In a phased manner, exposure to other critical developmental neurotoxins, like pesticides, phthalates and polychlorinated biphenyls, will also be assessed.

Follow-up assessments

At follow-up, assessments relevant to tracking development, changes in environmental exposures, and psychopathology are done. These include – socio-demographic and migration information, environmental exposures, MINI/MINI-KID, Strengths and Difficulties Questionnaire, ASSIST-Plus, and neuroimaging. MRI scans are repeated for those participants who underwent neuroimaging at baseline and consented for a repeat scan in follow-up. These would help track structural and functional brain growth trajectories. Blood/buccal swab samples are also collected from all consenting participants in follow-up, for epigenetic investigations.

Data analysis

Handling missing data

Attrition, difficulties in assessment within a set follow-up time frame, changes in recruitment techniques (face-to-face interviews to telephonic-based assessments), changes in reporting individuals (mother to child or vice versa), and, changes in questionnaire design are challenges in longitudinal cohorts. In cVEDA, we also have missing data from exogenous factors such as economic migration, weather catastrophes and political instability resulting in displacement of participants from one geographical area to another. These would be significant caveats to fitting longitudinal growth models for behaviour problems. “Planned missingness” improves the efficiency of longitudinal studies without compromising validity. As this missingness is by design we are able to assume that data are ‘missing completely at random’ and hence likelihood-based methods would produce unbiased effect estimates. Planned missingness can also affect the level of unplanned missingness. By reducing the number of waves of data that each participant contributes, participant-fatigue is reduced, study-dropouts are lower and, thereby, the rates of unplanned missingness.

Statistical analysis and modelling

The cVEDA will have first and foremost a descriptive analysis, as it is the first neurodevelopmental study of its kind in India with a diverse socioeconomic, geographic and cultural spread. It will present findings on key externalizing traits across the cohort with cluster analysis conducted per site to determine effects attributable to site variations. The results of cVEDA will be informed by and compared with the findings of European cohorts of children and young adults such as IMAGEN, ALSPAC, and SCAMP for behavioural traits (temperament and disorders), environmental exposures (e.g. psychosocial stressors and environmental toxins) and gene-environment interactions. Further, techniques like structural equation models [119] that estimate and converge multiple pieces of the outcome into a single latent growth or specific latent classes by age will be used to identify vulnerability factors for externalizing disorders and other mental health outcomes.

Establishment of a repository data and biobank

All cVEDA assessments are run on a digital platform using Psytools software (Delosis Ltd., UK). Data is first synchronized with the Psytools server, from where it is accessed by data management teams in India, UK and France. The final data storage server (Dell power edge R530 Rack server) is located at NIMHANS, along with a back-up safety system and a mirror data storage system in France. The data management team conducts sanity checks and feedback is regularly sent to recruitment sites.

Biological samples form part of a biobank located at NIMHANS. The databank and the biobank are a resource to facilitate research by consortium partners as well as collaborators investigating other areas of mental health, its interface with physical health, and cross-cultural comparisons.

Quality control

Quality control measures have been incorporated in the study protocol right from the beginning. Interviewer training, on-site and online, followed by mock interview assessments and feedback was conducted in the preparatory phase of the study. Following start of study on the field, India and UK based study coordinators conduct weekly recruitment meetings during which recruitment progress and completeness of data entry are reviewed. Standard operating procedures (SOPs) ensure consistency in biological samples collection and neuroimaging. Additionally, quality control procedures for neuroimaging are implemented at each site: (i) a phantom [120] is scanned to provide information about geometric distortions and signal uniformity related to hardware differences in radiofrequency coils and gradient systems, (ii) healthy volunteers are regularly scanned at each site to assess factors that cannot be measured using phantoms alone, and iii) after every subject scan, a quick 2-min script (https://github.com/cveda/cveda_mri) is run at the acquisition centre to detect any subject−/scanner-related artifacts and a decision is made if the data needs to be re-acquired. The India-based study coordinator also regular conducts site visits to monitor adherence to the study protocol.

Discussion

This paper presents the background and protocol of the cVEDA study. The study is designed to answer questions about the developmental, genomic, and environmental underpinnings of psychopathology. This has implications for preventive and early interventions for mental disorders. Biological samples collected from all participants significantly enrich exposome characterization in the sample and this will aid in discovering biomarkers of exposure and early disease through omic technologies (epigenomics, adductomics, proteomics, transcriptomics and metabolomics). Whilst it is beyond the scope of the current funding to carry out extensive omic analyses, we are establishing an integrated exposome database and biobank to facilitate future analyses. The cVEDA cohort is a substantial addition to, and provides comparative and cross-cultural datasets for international studies – IMAGEN that looks at biological and environmental factors affecting reinforcement related behaviours in teenagers; SCAMP looking at effects of mobile phone use on cognition in adolescents; ALSPAC with several decades of longitudinal data on developmental, environmental and genetic factors affecting a person’s overall health and development.

Our initial analyses have focused on establishing trajectories (behavioural, temperamental, neuropsychological) to increase our understanding of maturational brain changes and permit in-depth enquiries into the relationships between brain networks, cognition, behavior and environment during development, and how these contribute to the genesis of neurodevelopmental disorders.

Genomic DNA isolated from the buffy coat component of blood will be used for genetic studies, using next generation genotyping methods as well as genome wide epigenetic studies. RNA from samples stored in Tempus tubes will be used for transcriptome studies by arrays or RNA sequencing methods. Using statistical data reduction techniques (e.g: principal components analysis), endophenotype-genetics relationships (e.g: parallel independent component analysis), and quantitative trait loci, we will examine genetic basis of neuroimaging and neuropsychological endophenotypes.

Community engagement and capacity building

In addition to being a research initiative, cVEDA is an opportunity for community engagement. At the completion of the study we will provide a summary note to participants and an opportunity for group discussion of results, implications and treatment options, where relevant. The cVEDA consortium facilitates exchange of research, technical and statistical expertise and support with dissemination and publication of research findings via workshops and training programmes organised annually at the cVEDA investigators’ meetings at various study sites in India.

cVEDA is a first of its magnitude research exercise in India, an opportunity for research growth and capacity building. The establishment of a completely digitized data collection and transfer platform minimizes human error greatly. It also makes a large databank rapidly available to researchers working in the field. Through collaboration with institutions in the UK, research scholars also have the opportunity to partner with international researchers and faculty to build their research repertoire.

In conclusion, the cVEDA has established the largest neurodevelopmental database in India. The identification of environmental risk factors that contribute to vulnerabilities for psychiatric morbidity could have huge implications in public health interventions and prevention of psychiatric morbidity, in India. This unique database will facilitate international research collaborations, to study cross-cultural variations in the determinants of psychopathology.