The International Childhood Cancer Cohort Consortium (I4C): A research platform of prospective cohorts for studying the aetiology of childhood cancers

BACKGROUND
Childhood cancer is a rare but leading cause of morbidity and mortality. Established risk factors, accounting for <10% of incidence, have been identified primarily from case-control studies. However, recall, selection and other potential biases impact interpretations particularly, for modest associations. A consortium of pregnancy and birth cohorts (I4C) was established to utilise prospective, pre-diagnostic exposure assessments and biological samples.


METHODS
Eligibility criteria, follow-up methods and identification of paediatric cancer cases are described for cohorts currently participating or planning future participation. Also described are exposure assessments, harmonisation methods, biological samples potentially available for I4C research, the role of the I4C data and biospecimen coordinating centres and statistical approaches used in the pooled analyses.


RESULTS
Currently, six cohorts recruited over six decades (1950s-2000s) contribute data on 388 120 mother-child pairs. Nine new cohorts from seven countries are anticipated to contribute data on 627 500 additional projected mother-child pairs within 5 years. Harmonised data currently includes over 20 "core" variables, with notable variability in mother/child characteristics within and across cohorts, reflecting in part, secular changes in pregnancy and birth characteristics over the decades.


CONCLUSIONS
The I4C is the first cohort consortium to have published findings on paediatric cancer using harmonised variables across six pregnancy/birth cohorts. Projected increases in sample size, expanding sources of exposure data (eg, linkages to environmental and administrative databases), incorporation of biological measures to clarify exposures and underlying molecular mechanisms and forthcoming joint efforts to complement case-control studies offer the potential for breakthroughs in paediatric cancer aetiologic research.


INTRODUCTION
While cancer in children and adolescents is rare worldwide, it remains a leading cause of morbidity and mortality despite notable improvements in survival [1]. Established risk factors include prenatal exposure to diagnostic x-rays [2], genetic syndromes [3], and high birthweight [4] that combined, account for less than 10% of childhood cancer (CC) incidence [5]. More recently, pooled case-control studies of childhood leukemia (CL) suggest modestly increased risks associated with residential painting and pesticide use and pre-labor caesarean delivery [6] [7] [8] and slightly decreased risks from day care attendance, extended breastfeeding, and maternal vitamin and folic acid supplement use [9] [10].
Known and suspected risk factors for CC [2] are briefly summarized in Appendix A.
Timing of exposure appears to be associated with variable CC risks, with prenatal and early postnatal periods being particularly vulnerable windows [2] [11]. Increasing recognition of etiologic differences by subtype [2] underscores the need for case-control studies evaluating large numbers of distinct CC entities. While well-designed case-control studies can yield valid estimates, inherent limitations such as recall bias (differential recall of past exposures by case versus control mothers), selection bias (differential participation according to characteristics such as educational level or exposure status of cases compared with controls) and reverse causality may affect risk estimates and interpretation.
To complement and address methodologic limitations of case-control studies, pooling of multiple pregnancy/birth cohorts such as those involved in the International Childhood Cancer Consortium (I4C), could verify case-control study findings, identifying new risk factors and identify mechanisms of carcinogenesis [12,13]. Biospecimens collected 8 prospectively are an advantage of prospective pregnancy/birth cohort studies for exploring CC etiology, although a few case-control studies have accessed archived pre-diagnostic newborn blood spots [14] or cord blood [15].
Our objective is to report on the progress made by the I4C, furthering the description of Brown et al [16], in developing a platform through a collaborative network that provides access to repeated exposure 'measurement' data and biospecimens. We also describe challenges and future directions including collaborations with a consortium of case-control studies. 9

Overview, structure and operations
The overarching goal of the I4C is to understand the etiology and mechanistic underpinnings of CC by exploiting prospectively collected exposure and biomarker data. Currently contributing cohorts. Six cohorts currently contribute data on cancer cases, exposure data and biospecimens (if available) as described in Table 1a; more details are available in the published cohort descriptions.

Data sharing
Data sharing and material transfer agreements for the I4C were developed and approved by MCRI Ethics Committee and sent to cohort investigators for approval by their Ethics Committees. Only anonymized data were requested (see Appendix B).

Follow-up methods
Strategies and time points for follow-up varied (Table 1a) were around 60-70% for most cohorts ≥ 7 years post-natal.

CC case ascertainment and classification
Ascertainment. For participating cohorts, identification of CC cases has been reliant on linkage to national (ALSPAC, DNBC, MoBa and JPS) or state (TIHS) cancer registries except for CPP. The latter relied on medical records [17] and indirect methods [18]. Each potential cancer diagnosis in the CPP was reviewed by two board-certified pediatricians.
Classification. To date, age at diagnosis for CC has been < 15 years, but going forward, will extend to < 20 years. Tumors were classified into six major groups based on the International Classification of Diseases for Oncology (ICD-0) Third Edition [19]. For cohorts with IRB approval to access more detailed information, the following was provided: gender, showed 94% of cancers ascertained during 1971-89 [21]. Since CPP cases were identified through indirect methods, some cancer cases may have been missed.
Exposure data 12 Identification of data domains and specific variables associated with CC. Thirty exposure domains were established for key exposures (e.g. birthweight, folic acid supplements, and others; see Tables 3a-c). The IDCC will submit requests to obtain additional data if needed for future proposals (See Appendix C for details of process). While the main domains center around the mother and child (see Tables 3a and 3c), some information on fathers is also available (see Table 3b).

Identification of additional emerging cohorts
Two groups of emerging cohorts are currently involved in I4C activities but not as yet contributing cancer cases, exposures or biospecimens to the pool. These are detailed in

Housing of Data at the IDCC: Platform, Confidentiality, Privacy and Security Measures
The data transferred to the IDCC is securely housed on a web-based application located on the MCRI's secure e-Research portal (see Appendix F). Access is restricted to authorized personnel following approval by the I4C Steering Committee and a representative from each study contributing to the pooled dataset.
For added security, data files are encrypted before being sent to the IDCC. Most studies have excluded unique personal identifiers (e.g., name, residential address) and some have excluded month and day of birth. Individuals are identified by a study-specific identification number, and additional security is provided by assigning a unique I4C identification number used as the primary identifying key. The electronic data stored at the IDCC on a secure, password protected server. The network server, web server and SQL server undergo nightly 14 incremental backups plus a monthly full backup to tape for off-site storage. All users of the data must comply with the data sharing agreements.

Statistical consultation and support on study designs, data harmonization, and analyses
The I4C statistical team includes two senior biostatisticians (SL, GP) who provide input and advice on research proposals and undertake statistical analyses using the pooled dataset.
While complete harmonization of all questionnaire data is not feasible given cohort differences, decisions on pooling are based on the specific research question and what could be pooled with minimal compromise to the original recorded data. Further details and strategies are in Appendix G. 15
As harmonization proceeded, emerging cohorts requested information about data collection strategies and forms to facilitate future pooling of data. In response, the IDCC has developed a "New Cohort Protocol Support Package (NCPS)" to provide researchers with a 17 standardized format for the collection of exposure data for etiologic studies (see Appendix H).
Publications. The first I4C publication using a pooled dataset examined the association between birthweight and risk of CC and maternal adiposity measures as potential effect modifiers. A linear relationship was demonstrated for increasing risk of total CC and childhood leukemia with each kilogram increase in birthweight adjusted for gender and gestational age. No significant interactions were seen with maternal pre-pregnancy overweight or pregnancy weight gain. Birthweight >4.0 kg was linked with non-leukemia cancers but, only among children diagnosed at age three or older [4].
I4C members have described a new optimized method for extracting DNA from neonatal dried blood spots for application in methylome profiling [26] [27] using samples from several of the contributing cohorts. A review paper describes the characteristics of the epigenome as a key component of fetal exposure in evaluating in utero exposures and childhood cancer risk [28]. More recently, I4C members have begun cataloguing -omics signatures of early-life factors that could be associated with CC [29,30]. These signatures will be analyzed across the different I4C cohorts with available biological samples. This work will complement the I4C questionnaire-based epidemiological investigations and may provide mechanistic insights into CC etiology.
Ongoing data analyses. Current efforts are focused on: examining prospectively, the association of birth order and CL and the potential modifying roles of paternal age and birthweight; parental occupational exposure to pesticides, animals, and organic dust and risk of CC utilizing geocoded residential addresses (using DNBC for first analysis) to evaluate 18 pesticide use near the residences during the pregnancy as well as parental occupational exposure; prenatal maternal folic acid supplementation and risk of CC; maternal infections during pregnancy and CC; epigenetic precursors of CL.

Process for requesting data for new research proposals
The I4C Steering Committee facilitates data sharing provided that all approvals are in place.
The process for requesting data from any of the I4C contributing cohorts and the parallel steps undertaken at the IDCC to provide the data are in Appendix B.

COMMENTS
The I4C is a valuable resource comprising both questionnaire-based epidemiological data and biological samples offering unique opportunities to advance our understanding of the There is a critical role for prospective assessment of exposure using pre-diagnostic questionnaire data and biological samples, but the rarity of CC and identification of an expanding number of molecularly different CC subtypes underscores the strengths and limitations of the I4C. Pooling of multiple pregnancy and birth cohorts offers prospectively collected risk factor and mechanistic data to that obtained from case-control studies. For example, information about maternal diet, viral infections, and use of folic acid and other vitamin supplements periconceptionally or during pregnancy may not be accurately recalled or available in medical records and thus not captured well in case-control studies. Relatively minor infections during infancy, details of breast-feeding, and daycare may similarly not be accurately recalled years later. Despite these potential strengths, cohort studies may also suffer from methodologic shortcomings including selection bias (cohort members are generally volunteers), under-ascertainment or misclassification of cancer outcomes, loss to follow-up over time, limited time points of data collection and measurement error (depending on the exposure assessment methods and follow-up time periods). By jointly undertaking projects with investigators leading case-control studies, the strengths of each 20 study design can be maximized and the limitations and potential biases can be identified and quantified.

Future Directions
The I4C includes a growing number of participating cohorts and is poised to significantly increase its sample size within the next five years. I4C studies are incorporating a growing range of exposure assessment methods and tools, including Geographic Information Systems (GIS) to assess agricultural and pesticide exposures near residences, satellite measurements to measure ambient ultraviolet radiation and assignment of occupational exposures using job exposure matrices. Statistical approaches include sophisticated methods for quantifying temporal and age effects in the assessment of associations between exposure and outcome. Collaborative efforts have recently been undertaken to develop joint projects with the Childhood Leukemia International Consortium during future planned joint meetings. The prospects for combining multiple sources of pre-diagnostic exposure data and biological samples in conjunction with collaboration with other birth cohort and pediatric cancer case-control consortia offer the potential for future breakthroughs in pediatric cancer etiologic research. 21