COLDENT study is a “population-based” case–control study that was carried out in the Montreal metropolitan area (Montreal island and Laval), Quebec, from January 2013 to December 2019. The CRC cases were instances of histologically confirmed colon or rectal cancer diagnosed in the 6 months preceding their identification. CRC case identification relied on the assistance of medical staff in surgery and oncology departments of five main hospitals providing CRC care to residents of Montreal island and Laval. The control series was selected by random sampling of age- and sex-based strata of the population of Montreal island and Laval by relying on the Quebec Electoral Office lists during 2013–2019. Specifically, a control-to-case ratio of approximately 1:1 was aimed at across the strata defined by age (within 10 year categories) and sex. The identified/selected subjects were sent the study introductory letter and received a phone call from research staff in the following week. Several callbacks were made for non-responding numbers. The inclusion criteria for both cases and controls were as follows: (1) aged 40–80 years old; (2) resident of montreal island or laval; (3) Canadian citizen; (4) speak English and/or French; (5) no prior diagnosis of cancer; and (6) no prior diagnosis of an inflammatory or a hereditary bowel disease, including lynch syndrome, hereditary non-polyposis colorectal cancer, familial adenomatous polyposis, and related polyposis. Eligible respondents who agreed to participate in the study were invited to complete a multi-item study questionnaire in either a face-to-face or phone interview. If an eligible responder was unable to attend the interview, the questionnaire was offered for self-administration. In that case, the participant was instructed on questionnaire completion and was called back by research staff upon receipt of the completed questionnaire. Face-to-face interviews were carried out in research units, at the participant’s home, or in hospitals for CRC patients.
The study was approved by the Research Ethics Committees of all participating institutions, and all study participants were given all the information needed before they signed the study consent form.
Data on PD were collected using eight validated questions that address three categories of information on the subject’s periodontal condition: history of diagnosis or treatment of PD; symptoms or complications of PD; and self-awareness of PD [29, 30] (see Table 1). For each category of information, questions with the highest diagnostic performance indicators were selected, based on results of a previous systematic review by Blicher  on validity of self-reported periodontal disease measurements. In a more recent systematic review with meta-analysis, by Abbood , on self-reported PD, the estimated pooled diagnostic odds ratios (95% CI) of moderate PD by the questions on previous treatment of PD (deep cleaning); tooth mobility without injury; gum bleeding; and self-awareness of having gum disease were 2.38 (1.35–4.2), 6.99 (3.17–15.43), 1.40 (0.91–2.16), and 3.20 (2.23–4.57), respectively, and 11.72 (4.12–33.36), 2.24 (1.05–4.80), 1.95 (1.25–3.03), and 3.35 (2.17–5.18), respectively, for severe PD . Moderate and severe PD were defined based on the gold standard PD case definition by the Centers for Disease Control and Prevention and the American Academy of Periodontology (CDC-AAP) .
Further, seven study-questionnaires were administered: Sociodemographic and Medical History, Smoking, Height and Weight, Anti-inflammatory Medications, Oral Health, Food Frequency (FFQ), and Lifetime Total Physical Activity (LTPAQ) questionnaires [32,33,34,35,36]. Through administration of these questionnaires, we collected data on history of cigarette smoking, including age started, age ended, years quit during period of usage, and intensity (number of cigarettes smoked per day, per week, or per month). Participants were asked to report all occupational, household, and recreational activities they had done in their lifetime. The minimum threshold for an activity to be reported in the LTPAQ is 124 h/year for occupational, 112 h/year for household, and 32 weeks/year for recreational activities . Each activity was described in terms of duration (age started and age ended), frequency (number of hours per week, weeks per month, and months per year of activity practice), and intensity: light, moderate, and vigorous. Weak intensity was only used for occupational activities to describe those that require sitting with minimal walking.
For dietary risk factors, data were collected on intake of different kinds of red meats and processed meats and of different alcoholic beverages since adulthood. Specifically, red meats referred to meat in hamburgers, beef, pork, and lamb; processed meats referred to bacon, hot dogs, or other kinds of processed meats as salami, bologna, and sausages; and alcoholic drinks referred to beer, wine, and liquor. The FFQ was then administered for four age periods: 20–34, 35–49, 50–64, and 65–80 years. Interviewers relied on the lifetime grid technique to enhance recall accuracy .
Coding of data on periodontal disease and covariates
Subjects were classified as having a positive history of PD if they reported previous professional diagnosis or treatment of PD, and/or if they had experienced either frequent gingival bleeding, or tooth loss caused by tooth mobility or PD, and/or if they were self-aware of having PD. Periodontal health status of participants who answered Yes to only question 6 about tooth mobility (see questions in Table 1) was considered as unknown since tooth mobility could also be caused by an occlusal trauma in a healthy periodontium. Participants were considered as “unexposed” to PD if none of the responses they provided to the 8 questions was positive.
Covariates for adjustment included age, gender, education attainment (elementary school vs. high school and ≥ college), annual personal income, body mass index (BMI), history of type II diabetes, history of CRC in first-degree relatives, history of regular use of NSAIDs (Yes/No), lifetime measure of cigarette smoking (as quantified by packs-years), lifetime measures of consumption of red meats, processed meats, and alcohol, as well as lifetime cumulative physical activity score.
Regular use of NSAIDs was defined as use of at least one tablet/capsule of NSAIDs per month for six continuous months or longer, of aspirin and non-aspirin NSAIDs (NA-NSAIDs). The number of packs-years was calculated as the product of the number of cigarettes smoked per day divided by 20 and the number of years smoked. Lifetime measures of consumption of red meats, processed meats, and alcoholic drinks were calculated as the average number of consumed weekly servings for red and processed meats, and of daily drinks for alcohol, since the participant was 20 years old.
Lifetime physical activity score was represented by the average amount of total energy expended during occupational, recreational, and household activities, expressed by metabolic-equivalent of task (MET) in MET-hour/week/year [37, 39,40,41]. For this, we assigned MET values to the intensity of physical activities: weak as 1.5, light as 2.5, moderate as 4, and vigorous as 8. To consider both the effects of duration and intensity, each activity was converted into energy expended by multiplying its assigned MET value with the reported hours spent in the activity per year, and the number of years the activity lasted. All subject activities were then summed to derive subject lifetime cumulative energy expended in MET-hours-years. This cumulative measure was then divided by the individual’s age (in years) and by 52 (i.e., the number of weeks in a year) to derive physical activity scores in MET-hour/week/year.
The distributions of potential confounders in the case and control series were examined by calculating the median and inter-quartile range for continuous variables, and percentage for categorical variables.
We fitted multivariable unconditional logistic regression models to estimate the rate ratio (RR) quantifying the association between CRC and PD. Specifically, in the multivariable models, the RR was adjusted for the matching variables (age and sex) and for all the other potential confounders, namely, education attainment, annual personal income, BMI, history of type II diabetes, history of CRC in first-degree relatives, history of regular use of aspirin and NA-NSAIDs, lifetime measure of cigarette smoking, lifetime measure of consumption of red meats, lifetime measure of consumption of processed meats, lifetime measure of consumption of alcohol, and lifetime cumulative physical activity score. The linearity in the logit was assessed for all continuous independent variables, namely age, BMI, annual personal income, lifetime measure of consumption of red meats, processed meats, and alcoholic drinks, and lifetime physical activity score, using the Box-Tidwell test, which involves adding simultaneously interaction terms of each continuous variable and its natural logarithm (Xi multiplied by ln(Xi)) to the multivariable regression model . The test revealed that none of the interaction terms involving the above-mentioned continuous covariates was statistically significant at the alpha level of 0.05.
The percentages for missing data for each variable were less than 10%, except for regular use of aspirin (23%), which was absent from the NSAID questionnaire during the first year of the study (see Supplementary Table S1). Missing data were addressed with the multiple imputation method using the Expectation–Maximization with Bootstrapping algorithm : 10 complete datasets were generated to produce pooled “final” RR estimates along with their corresponding 95% confidence intervals (CIs). To improve performance of the imputation algorithm, auxiliary variables were included for imputation, in addition to all the variables included in the “associational” models. Continuous variables with asymmetric distribution, or extreme values, as for cumulative cigarette smoking, lifetime measures of consumption of red meats, processed meats, and alcoholic drinks, were log-transformed (Log10 (Xi) or Log10 (Xi + 1)) before imputation. Imputed values of continuous variables were restricted to the observed minimum and maximum values. Imputation was done respecting the scale of each variable (i.e., continuous, ordinal, or categorical) with the Amelia II package in R, version 3.5.3 .