Abstract
Cancer centres rely on electronic information in oncology information systems (OIS) to guide patient care. We investigated the completeness and accuracy of routinely collected head and neck cancer (HNC) data sourced from an OIS for suitability in prognostic modelling and other research. Three hundred and fifty-three adults diagnosed from 2000 to 2017 with head and neck squamous cell carcinoma, treated with radiotherapy, were eligible. Thirteen clinically relevant variables in HNC prognosis were extracted from a single-centre OIS and compared to that compiled separately in a research dataset. These two datasets were compared for agreement using Cohen’s kappa coefficient for categorical variables, and intraclass correlation coefficients for continuous variables. Research data was 96% complete compared to 84% for OIS data. Agreement was perfect for gender (κ = 1.000), high for age (κ = 0.993), site (κ = 0.992), T (κ = 0.851) and N (κ = 0.812) stage, radiotherapy dose (κ = 0.889), fractions (κ = 0.856), and duration (κ = 0.818), and chemotherapy treatment (κ = 0.871), substantial for overall stage (κ = 0.791) and vital status (κ = 0.689), moderate for grade (κ = 0.547), and poor for performance status (κ = 0.110). Thirty-one other variables were poorly captured and could not be statistically compared. Documentation of clinical information within the OIS for HNC patients is routine practice; however, OIS data was less correct and complete than data collected for research purposes. Substandard collection of routine data may hinder advancements in patient care. Improved data entry, integration with clinical activities and workflows, system usability, data dictionaries, and training are necessary for OIS data to generate robust research. Data mining from clinical documents may supplement structured data collection.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Globally, head and neck cancer (HNC) contributed to 4.9% of all new cancers and 4.7% of all cancer deaths in 2020 [1]. Contrastingly, within New South Wales (NSW), Australia, 2.8% and 2.7% of all new projected cases and deaths respectively in 2022 will be due to HNC, with a projected increase in incidence and death by 15.8% and 17.3% respectively since 2017 [2]. Predicting survival outcomes in patients with HNC using robust data can help guide treatment practices to improve care and outcomes.
In NSW, cancer data can be accessed from the Australian Cancer Database [3], or the NSW Central Cancer Registry [4], however, neither resource includes the clinical detail necessary to predict HNC survival. Unavailable prognostic factors include smoking history [5], comorbidities [6], human papillomavirus infection [5, 7], fitness for surgery [8], cancer operability [8], size and location of involved lymph nodes in the neck [5, 9, 10], and radiotherapy dose and duration [5, 8, 11, 12].
As part of routine clinical practice, HNC data are entered into structured or free-text fields within the oncology information system (OIS), extracted using structured query language [13], and reported to the NSW Central Cancer Registry. Independently, research personnel at Prince of Wales Hospital (POWH), in Sydney Australia, have compiled a HNC dataset for research purposes using information sourced from clinical documents uploaded into the OIS. The documents are reports from clinical, pathological, and imaging examinations; treatment summaries; correspondence; discharge notes; and follow-up consultations. High quality HNC research datasets are scarce; therefore, the suitability of utilising routinely collected OIS data for outcomes analysis is warranted. Conversely, real-world oncology data is used extensively for other purposes [14], including billing and reimbursement for healthcare services [15, 16], documentation, assessment, and provision of clinical care and treatment pathways [17], epidemiology of disease incidence, prevalence, and trends for disease monitoring [18], clinical trial drug development [19], and machine learning (clinical decision-making, treatment planning, image segmentation, and image guidance) [20].
To provide insight into the utility of OIS, previous studies have investigated data quality and completeness. One study compared the utilisation of the same OIS by two hospitals in Australia, finding one used the full capacity of the OIS fields, while the other focused predominantly on booking and patient tracking [21]. A second study compared the concordance of clinical data in cancer patients between an OIS and a cancer registry. Smoking was highly complete in the OIS; however, only moderate agreement was evident [22]. A third study investigated whether cancer quality measures from an OIS were adequate for Medicare reimbursement in the US [23]. The study reported varying data completion, consequent of underfilling and inconsistencies in OIS data elements, and concluded automated OIS extracts could not yet replace manual abstraction. This study investigates the quality and completeness of OIS data for prognostic modelling in HNC at a major metropolitan teaching and tertiary referral hospital.
Methods
Patient eligibility criteria
Adult patients aged 18 years or older presenting at POWH between 1 January 2000 and 31 December 2017 with a newly diagnosed invasive or in-situ mucosal HNC of squamous cell carcinoma morphology, treatment with definitive radiotherapy (± chemotherapy or surgery), known stage, and nil distant metastases, were eligible. Patients diagnosed with cancer of the external lip or commissure, or distant metastases, were excluded. The 2009 Union for International Cancer Control TNM 7th edition manual was used to clinically stage all patients in this study.
Data collection
The research dataset was previously compiled by trained research personnel. Demographic, diagnostic, treatment, and outcome data was extracted from clinical, histopathological, treatment, and follow-up documents to create a structured dataset (SESLHD HREC 10/040). The research dataset is considered the gold standard dataset for this study. Death data for the research dataset was sourced from: the National Death Index (NDI) via probabilistic record linkage (EO2017/5/392) in February 2018; internal hospital records; the NSW Registry of Births, Death and Marriages; and the Ryerson Index (death notices and obituaries in Australian newspapers). Due to lags adjudication, cause of death data was unavailable for patients registered on the NDI in the two years prior to linkage (2016–2017).
The OIS dataset consisted of routinely collected HNC data extracted from the POWH OIS (MOSAIQ, version 2.60, by Elekta [24]), a proprietary electronic medical record, using structured query language (SQL Server 8 – Crystal Reports). The OIS contains fields for administrative information (e.g., personal details and appointments), patient characteristics (e.g., tobacco use, performance score), disease features (e.g., ICD codes, diagnosis date, staging, histology, morphology), treatment details (e.g., radiotherapy, chemotherapy, surgery, hormone/immunotherapy), and follow-up (e.g., disease and vital status). The OIS is a stand-alone system used by radiation oncology departments in Australia with consistency in the availability of the OIS fields required for mandatory government reporting [25, 26].
Radiotherapy and chemotherapy data in the OIS dataset required manual curation post-extraction. Data extraction resulted in multiple records for each patient, one for each anatomical site receiving radiotherapy. Patients could receive radiotherapy to local, regional, or distant sites, with variation in the terms used to define the radiotherapy treatment sites. Duplicates were removed based on fields uniquely identifying the delivered doses. Radiotherapy data for each diagnosis was manually reviewed by a radiation oncologist to determine the correct dose/fractions to the primary site and neck, which were combined to provide a single value for dose and fraction. Radiotherapy duration was defined as the number of days between the first and last fraction. For quality assurance, radiotherapy data was validated against the full medical record on a random sample of 30 patients.
Statistical analysis
Forty-four demographic, tumour, treatment, and outcome variables of clinical relevance in HNC survival were analysed. Data was reported as frequencies (%) for categorical data and mean (standard deviation) and median (interquartile range, IQR) for normally and non-normally distributed continuous data respectively (distribution determined using the Shapiro-Wilk test).
The McNemar test was used to determine differences in the distribution of categorical values between the two datasets. The McNemar-Bowker test was used for data with more than two levels. A paired t-test was used to assess differences between normally distributed continuous data, while the Wilcoxon Signed Rank test was used for non-normally distributed continuous data.
Cohen’s kappa coefficient (κ) was used to investigate agreement between categorical variables, reported alongside the standard error (SE). We adapted the Landis et al. [27] interpretation of κ, with 0 to < 0.2 classified as poor agreement, 0.2 to < 0.4 as slight, 0.4 to < 0.6 as moderate, 0.6 to < 0.8 as substantial, 0.8 to < 1.0 as high, and 1.0 as perfect. Intraclass correlation coefficients with two-way mixed models assessing absolute agreement were used to investigate agreement between continuous variables, reported alongside a 95% confidence interval (95% CI).
The level of significance for all tests was P < 0.05 and all P values are two-sided. Statistical analysis was performed using SPSS 26 (IBM, Armonk, New York), and all analysis was paired.
Ethical considerations
The study was approved by the NSW Population and Health Services Research Ethics Committee (2019/ETH12196).
Results
A total of 353 patients were eligible for inclusion in the study. Eighteen (5%) patients had two primary head and neck malignancies. All 44 variables that were investigated, and their level of completeness in both datasets, are displayed in Fig. 1, of which 13 could be statistically compared (Tables 1, 2 and 3). The overall data completion rate for the research and OIS datasets was 96% and 26% respectively for the 44 investigated variables, and 96% and 84% for the 13 analysed variables. The overall agreement between the two datasets for the 13 variables was 0.79.
Patient characteristics
Three variables were statistically compared from the twelve that were investigated (Table 1). Data on age and gender were 100% complete in the OIS dataset, with perfect agreement for gender (81% male). The mean age was 61.4 years in the research dataset and 61.9 years in the OIS dataset, with high agreement (κ = 0.993, 95% CI 0.990–0.995, p < 0.001). The mean difference of 0.5 years was statistically significant (p < 0.001). Performance status was known for only 12% of the cohort in the OIS dataset compared to 98% in the research dataset (κ = 0.110, SE 0.136, p = 0.945).
Tumour features
Table 2 displays five of the nine clinical tumour variables that could be compared. Tumour site was 100% complete in both datasets, demonstrating high agreement (κ = 0.992, SE 0.006, p < 0.001). Carcinoma of the oropharynx (41%) and larynx (30%) were the most common tumour sites. For stage, agreement was high for T stage (κ = 0.851, SE 0.023, p < 0.001) and N stage (κ = 0.812, SE 0.027, p < 0.001), and substantial for overall stage (κ = 0.791, SE 0.027, p < 0.001). In both datasets, almost half of all tumours were stage IV, reflected by the high incidence of N2/N3 stage disease. Data on grade was 78% and 56% complete in the research and OIS datasets, respectively, and agreement was moderate (κ = 0.547, SE 0.033, p < 0.001).
Treatment details
Ten treatment factors were collected and four compared (Table 3). According to the research dataset, radiotherapy was delivered at a median dose of 64 Gy, a median of 32 fractions, over a median of 39 days. In the OIS dataset, radiotherapy was delivered at a median dose of 66 Gy, with 33 median fractions, over a median of 39 days. This resulted in high agreement for dose (κ = 0.889, 95% CI 0.862–0.910, p < 0.001), fractions (κ = 0.856, 95% CI 0.821–0.883, p < 0.001) and duration (κ = 0.818, 95% CI 0.776–0.853, p < 0.001). The median differences of two gray (p = 0.018) and one fraction (p = 0.016) were statistically significant.
Chemotherapy data was not routinely collected for diagnoses before 2012 in the OIS dataset, resulting in 22% completion compared to 100% for the research dataset. From 2012, 68 (19%) and 58 (16%) tumours were treated with chemotherapy according to the research and OIS datasets, respectively, resulting in high agreement (κ = 0.871, SE 0.073, p < 0.001). Only 1% of surgical data was available in the OIS dataset compared to 100% for the research dataset, preventing further analysis.
Treatment outcomes
Information on treatment outcomes could not be compared since the number of tumours which responded to treatment could not be determined in the OIS dataset despite available fields (Fig. 1). Treatment outcome data was largely complete in the research dataset; treatment failed for 59 (17%) and 43 (12%) tumours at the local or nodal site, respectively. Subsequently, 27 and 18 retreated tumours experienced a second local and nodal failure, respectively. Thirty-two (9%) patients presented with metastasis during follow-up, and 51 (14%) patients with a new primary.
Vital status
The date of last follow-up was only available in the research dataset, with a median follow-up of 3.9 years (IQR 1.7-7.0 years). The research dataset captured vital status and cause of death, enabling classification as HNC-related death, non-HNC-related death, or unknown cause, whereas the OIS dataset only included vital status (Table 3). Substantial agreement was observed for vital status (κ = 0.689, SE 0.038, p < 0.001). In the research dataset, 157 deaths were recorded, of which 81 (52%) were due to HNC, 73 (46%) from other causes, and 3 (2%) from unknown causes.
Time expenditure
Data extraction and entry time was retrospectively estimated at 99 and 265 h for the OIS and research datasets, respectively. Query development and testing for the OIS dataset was estimated at 35 h with five hours of data curation.
Discussion
Routinely collected HNC data sourced from a major metropolitan teaching and tertiary referral hospital OIS was less accurate and complete compared to a dataset compiled for research purposes. This led to varying levels of agreement when comparing the datasets, consequent of OIS data entry practices and utility. This is the first study to investigate the quality and completeness of routinely collected HNC data in an OIS.
Cancer centres rely on routinely collected patient information to inform decision-making and guide patient care; however, relatively few studies have examined the accuracy and completeness of OIS data. One US study investigated the concordance of OIS compared to cancer registry data for 11,110 patients [22]. The authors reported a high completion rate for data elements (age, race/ethnicity, gender, and smoking-related cancer) in the OIS, and an overall moderate agreement for these variables (κ = 0.78), comparable to the overall agreement in our study (κ = 0.79). Results from our study and that by LeLaurin et al. [22] support the conclusion that complete data does not imply high quality data.
Cancer centres collect data to suit their needs, including clinical, administrative, and reporting purposes. The OIS has the capacity to capture most of the clinically relevant demographic, diagnostic, treatment, and outcome data necessary for HNC prognostic research, however, not all user-defined fields are routinely completed. One study has demonstrated this, whereby two Australian hospitals utilised the same OIS for different purposes [21]. One hospital used the full capacity of the OIS administrative and clinical fields for documentation, while the other hospital used only the administrative fields. Our results indicate the OIS is used for both administrative and clinical purposes at this cancer centre; however, the collection of clinical information in the OIS is limited to the recording of clinical fields mandated for collection by the government, insufficient for research purposes.
OIS prescription fields (radiotherapy start and end date, dose, and fractions) are automatically populated following radiotherapy treatment, resulting in multiple doses for each patient, each signifying a treatment course, anatomical treatment field, or overlapping treatment fields [13]. Prior to analysis, OIS dose data was summarised as a singular total dose. In the clinical documents, radiation oncologists report the singular total dose delivered by the OIS. To determine whether radiotherapy data extracted from a single OIS using automated and manual methods resulted in the same data accuracy, a study evaluated 251 German meningioma patient records [28]. Automated extraction resulted in significantly lower retrieval time (35 h) and higher accuracy (93.9%) compared to manual processes (668 h, 91.2% accuracy, p = 0.009). In our study, extraction of radiotherapy data from the OIS was less accurate than manual extraction from clinical documents for the research dataset.
All other fields in the OIS require manual entry of information as structured or free-text. The information reported in structured fields such as stage and radiotherapy treatment were complete, though varied between the research and OIS datasets, while chemotherapy and surgical treatment information was limited in OIS dataset. One explanation is that staging data at the time of initial investigation may not have been updated in the OIS following subsequent investigation with more advanced diagnostic techniques. In the research dataset, a radiation oncologist retrospectively staged each patient, with no feedback mechanism in place to update the OIS. Although we observed high statistical agreement between the datasets for tumour stage, the agreement was inadequate for clinical or epidemiological research purposes, as TNM staging is a vital prognostic factor for HNC [5, 7, 9, 29]. Implementing feedback sessions between clinicians and data administrators may improve the quality of staging information and capture changes in disease stage over time [30].
Entry of chemotherapy and surgical treatment data in the OIS fields was poor, and largely complete in the research dataset. Free-text fields such as radiotherapy treatment site also resulted in variation between datasets. To improve the quality of radiotherapy data extracted from the OIS, standardisation of terminology is critical to ensure radiotherapy data is entered in a non-ambiguous manner for accurate data extraction and subsequent analysis [31,32,33]. Standardised treatment and outcome data are necessary to reliably investigate patterns of care and survival outcomes [5, 34, 35], and institutions contemplating research need to consider whether the structure of their OIS, data entry practices, staff and data availability, support their research needs.
Maintaining multiple datasets is neither practical nor cost-effective, an issue not unique to this OIS or cancer centre [36,37,38]. The goal is a single electronic health record that can serve multiple purposes, i.e., administration, clinical care, government reporting, and research. One approach is to improve data management practices without modifying the OIS. Establishing clinical leadership, commitment, and engagement with clerical, medical, and management staff across the department [21], with a strong collaborative information-sharing strategy and support for cultural change, are essential to identify, discuss, and implement processes to improve data management practices [39]. Successful implementation requires thorough planning, modification of training, and resource requirements, and regular auditing to optimise the utility of the OIS [39]. The data strategy should also detail the data standards (terminologies, vocabularies and coding schemes) and accountability, and include a clear vision of how the OIS will be used to support clinical practice and improve patient outcomes [21].
In conjunction with the above data strategy, another approach is to redesign the OIS or implement external customised systems to improve OIS clinical and research functionality. For example, instating mandatory completion of structured fields in the OIS improves data completeness [40]. Customised web-based electronic data capture systems can be used in tandem with an OIS to consolidate clinical information, reduce redundancy, and improve completeness of data fields, without detracting from clinical workflow, by reducing free-text data entry and increasing use of structured data fields [41].
A third approach is to consider data mining techniques using the clinical documents in the OIS, where certain fields can be anonymised prior to the use of these documents for medical research [42]. The research dataset demonstrates the required data are available in the clinical documentation saved within the OIS. Therefore, the application of natural language processing to categorise unstructured information from clinical documents into structured data is worthy of examination [43,44,45,46]. The availability of enriched clinical data may facilitate further research and collaboration and improve outcomes for people with HNC.
There were limitations to this study. The availability of data in the research dataset is dependent on clinicians reporting the information in the clinical documents. Multiple researchers were involved in the extraction of data for the research dataset, and differences in data interpretation may be present. Both datasets relied on manual data extraction, with the possibility of human error. Multisite comparisons to broadly understand OIS practices are not possible due to a lack of research datasets.
Conclusion
Data manually extracted from unstructured clinical documents for research purposes was more complete and higher quality than data collected routinely in an OIS. The OIS dataset is not yet suitable for robust epidemiological research. Improved OIS data entry, integration with clinical activities and workflows, system usability, data dictionaries, and training are necessary before these data can be leveraged for robust research. Automated data mining techniques from electronic documents stored in the OIS should be investigated.
Data availability
Data are not available due to privacy/ethical restrictions.
Change history
03 March 2023
Missing Open Access funding information has been added in the Funding Note.
References
Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A et al (2021) Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin 71:209–249. https://doi.org/10.3322/caac.21660
Cancer Institute NSW. Head and neck cancer statistics. In: https://www.cancer.nsw.gov.au/research-and-data/cancer-data-and-statistics/cancer-type-summaries-for-nsw/head-and-neck-cancer-statistics Alexandria, NSW: Cancer Institute NSW; [updated 12-11-2020; cited 20-07-2021]
Australian Institute of Health and Welfare. Australian Cancer Database. In: https://www.aihw.gov.au/about-our-data/our-data-collections/australian-cancer-database [updated 13-11-2018; cited 24-07-2020]
Centre for Health Record Linkage. NSW Central Cancer Registry. In: https://www.cherel.org.au/data-dictionaries Alexandria, NSW: Cancer Institute NSW; [cited 17-06-2021]
Naghavi AO, Echevarria MI, Strom TJ, Abuodeh YA, Ahmed KA, Venkat PS et al (2016) Treatment delays, race, and outcomes in head and neck cancer. Cancer Epidemiol 45:18–25. https://doi.org/10.1016/j.canep.2016.09.005
Carey RM, Fathy R, Shah RR, Rajasekaran K, Cannady SB, Newman JG et al (2020) Association of type of treatment facility with overall survival after a diagnosis of head and neck cancer. JAMA Netw Open 3:e1919697-e1919697. https://doi.org/10.1001/jamanetworkopen.2019.19697
Rios Velazquez E, Hoebers F, Aerts HJ, Rietbergen MM, Brakenhoff RH, Leemans RC et al (2014) Externally validated HPV-based prognostic nomogram for oropharyngeal carcinoma patients yields more accurate predictions than TNM staging. Radiother Oncol 113:324–330. https://doi.org/10.1016/j.radonc.2014.09.005
Smee R, Williams JR, Kotevski DP (2019) Surgery is not the only determinant of an outcome in patients with hypopharyngeal carcinoma. Head Neck 41:1165–1177. https://doi.org/10.1002/hed.25496
van der Ploeg T, Datema F, Baatenburg de Jong R, Steyerberg EW (2014) Prediction of survival with alternative modeling techniques using pseudo values. PLoS One 9:e100234. https://doi.org/10.1371/journal.pone.0100234
Sinha P, Kallogjeri D, Gay H, Thorstad WL, Lewis JS, Jr., Chernock R et al (2015) High metastatic node number, not extracapsular spread or N-classification is a node-related prognosticator in transorally-resected, neck-dissected p16-positive oropharynx cancer. Oral Oncol 51:514–520. https://doi.org/10.1016/j.oraloncology.2015.02.098
Fujiwara RJ, Judson BL, Yarbrough WG, Husain Z, Mehra S (2017) Treatment delays in oral cavity squamous cell carcinoma and association with survival. Head Neck 39:639–646. https://doi.org/10.1002/hed.24608
Morse E, Fujiwara RJT, Judson B, Mehra S (2018) Treatment delays in laryngeal squamous cell carcinoma: a national cancer database analysis. Laryngoscope 128:2751–2758. https://doi.org/10.1002/lary.27247
Dilling TJ (2020) Artificial intelligence research: the utility and design of a relational database system. Adv Radiat Oncol 5:1280–1285. https://doi.org/10.1016/j.adro.2020.06.027
Penberthy LT, Rivera DR, Lund JL, Bruno MA, Meyer AM (2022) An overview of real-world data sources for oncology and considerations for research. CA Cancer J Clin 72:287–300. https://doi.org/10.3322/caac.21714
Sharma M, Duan Z, Zhao H, Giordano SH, Chavez-MacGregor M (2020) Real-world patterns of everolimus use in patients with metastatic breast cancer. Oncologist 25:937–942. https://doi.org/10.1634/theoncologist.2019-0602
Maclean JC, Halpern MT, Hill SC, Pesko MF (2020) The effect of Medicaid expansion on prescriptions for breast cancer hormonal therapy medications. Health Serv Res 55:399–410. https://doi.org/10.1111/1475-6773.13289
Bowles EJ, Wellman R, Feigelson HS, Onitilo AA, Freedman AN, Delate T et al (2012) Risk of heart failure in breast cancer patients after anthracycline and trastuzumab treatment: a retrospective cohort study. J Natl Cancer Inst 104:1293–1305. https://doi.org/10.1093/jnci/djs317
Henley SJ, Ward EM, Scott S, Ma J, Anderson RN, Firth AU et al (2020) Annual report to the nation on the status of cancer, part I: National cancer statistics. Cancer 126:2225–2249. https://doi.org/10.1002/cncr.32802
Davies J, Martinec M, Delmar P, Coudert M, Bordogna W, Golding S et al (2018) Comparative effectiveness from a single-arm trial and real-world data: alectinib versus ceritinib. J Comp Eff Res 7:855–865. https://doi.org/10.2217/cer-2018-0032
Field M, Hardcastle N, Jameson M, Aherne N, Holloway L (2021) Machine learning applications in radiation oncology. Phys Imaging Radiat Oncol 19:13–24. https://doi.org/10.1016/j.phro.2021.05.007
Yu P, Gandhidasan S, Miller AA (2010) Different usage of the same oncology information system in two hospitals in Sydney–lessons go beyond the initial introduction. Int J Med Inform 79:422–429. https://doi.org/10.1016/j.ijmedinf.2010.03.003
LeLaurin JH, Gurka MJ, Chi X, Lee JH, Hall J, Warren GW et al (2021) Concordance between electronic health record and tumor registry documentation of smoking status among patients with cancer. JCO Clin Cancer Inform 5:518–526. https://doi.org/10.1200/cci.20.00187
Schorer AE, Moldwin R, Koskimaki J, Bernstam EV, Venepalli NK, Miller RS et al (2022) Chasm between cancer quality measures and electronic health record data quality. JCO Clin Cancer Inform 6:e2100128. https://doi.org/10.1200/cci.21.00128
Elekta. MOSAIQ® Radiation Oncology. In: https://www.elekta.com/products/oncology-informatics/mosaiq-plaza/medical-oncology/
Field M, Thwaites DI, Carolan M, Delaney GP, Lehmann J, Sykes J et al (2022) Infrastructure platform for privacy-preserving distributed machine learning development of computer-assisted theragnostics in cancer. J Biomed Inform 134:104181. https://doi.org/10.1016/j.jbi.2022.104181
Kotevski DP, Smee RI, Vajdic CM, Field M (2022) Machine learning and nomogram prognostic modeling for 2-year head and neck cancer–specific survival using electronic health record data: a multisite study. JCO Clin Cancer Inform early access
Landis JR, Koch GG (1977) The measurement of observer agreement for categorical data. Biometrics 33:159–174.
Rutzner S, Fietkau R, Ganslandt T, Prokosch HU, Lubgan D (2017) Electronic support for retrospective analysis in the field of radiation oncology: proof of principle using an example of fractionated stereotactic radiotherapy of 251 meningioma patients. Front Oncol 7:16. https://doi.org/10.3389/fonc.2017.00016
Egelmeer AG, Velazquez ER, de Jong JM, Oberije C, Geussens Y, Nuyts S et al (2011) Development and validation of a nomogram for prediction of survival and local control in laryngeal carcinoma patients treated with radiotherapy alone: a cohort study based on 994 patients. Radiother Oncol 100:108–115. https://doi.org/10.1016/j.radonc.2011.06.023
Taggart J, Liaw ST, Yu H (2015) Structured data quality reports to improve EHR data quality. Int J Med Inform 84:1094–1098. https://doi.org/10.1016/j.ijmedinf.2015.09.008
Evans SB, Fraass BA, Berner P, Collins KS, Nurushev T, O’Neill MJ et al (2016) Standardizing dose prescriptions: an ASTRO white paper. Pract Radiat Oncol 6:e369-e381. https://doi.org/10.1016/j.prro.2016.08.007
Phillips MH, Serra LM, Dekker A, Ghosh P, Luk SMH, Kalet A et al (2020) Ontologies in radiation oncology. Phys Med 72:103–113. https://doi.org/10.1016/j.ejmp.2020.03.017
Mayo CS, Moran JM, Bosch W, Xiao Y, McNutt T, Popple R et al (2018) American Association of Physicists in Medicine Task Group 263: standardizing nomenclatures in radiation oncology. Int J Radiat Oncol Biol Phys 100:1057–1066. https://doi.org/10.1016/j.ijrobp.2017.12.013
Harris BN, Pipkorn P, Nguyen KNB, Jackson RS, Rao S, Moore MG et al (2019) Association of adjuvant radiation therapy with survival in patients with advanced cutaneous squamous cell carcinoma of the head and neck. JAMA Otolaryngol Head Neck Surg 145:153–158. https://doi.org/10.1001/jamaoto.2018.3650
VanderWalde NA, Meyer AM, Deal AM, Layton JB, Liu H, Carpenter WR et al (2014) Effectiveness of chemoradiation for head and neck cancer in an older patient population. Int J Radiat Oncol Biol Phys 89:30–37. https://doi.org/10.1016/j.ijrobp.2014.01.053
Nind T, Sutherland J, McAllister G, Hardy D, Hume A, MacLeod R et al (2020) An extensible big data software architecture managing a research resource of real-world clinical radiology data linked to other health data from the whole Scottish population. Gigascience https://doi.org/10.1093/gigascience/giaa095
Chambers DA, Amir E, Saleh RR, Rodin D, Keating NL, Osterman TJ et al (2019) The impact of big data research on practice, policy, and cancer care. Am Soc Clin Oncol Educ Book 39:e167-e175. https://doi.org/10.1200/edbk_238057
Payne T, Fellner J, Dugowson C, Liebovitz D, Fletcher G (2012) Use of more than one electronic medical record system within a single health care organization. Appl Clin Inform 3:462–474. https://doi.org/10.4338/aci-2012-10-ra-0040
Evans WK, Ashbury FD, Hogue GL, Smith A, Pun J (2014) Implementing a regional oncology information system: approach and lessons learned. Curr Oncol 21:224–233. https://doi.org/10.3747/co.21.1923
Cecchini M, Framski K, Lazette P, Vega T, Strait M, Adelson K (2016) Electronic intervention to improve structured cancer stage data capture. J Oncol Pract 12:e949-e956. https://doi.org/10.1200/jop.2016.013540
Pasalic D, Reddy JP, Edwards T, Pan HY, Smith BD (2018) Implementing an electronic data capture system to improve clinical workflow in a large academic radiation oncology practice. JCO clinical cancer informatics 2:1–12. https://doi.org/10.1200/cci.18.00034
Kotevski DP, Smee RI, Field M, Nemes YN, Broadley K, Vajdic CM (2022) Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting. Int J Med Inform 168:104880. https://doi.org/10.1016/j.ijmedinf.2022.104880
Karimi YH, Blayney DW, Kurian AW, Shen J, Yamashita R, Rubin D et al (2021) Development and use of natural language processing for identification of distant cancer recurrence and sites of distant recurrence using unstructured electronic health record data. JCO Clin Cancer Inform 5:469–478. https://doi.org/10.1200/cci.20.00165
Zeng J, Banerjee I, Henry AS, Wood DJ, Shachter RD, Gensheimer MF et al (2021) Natural language processing to identify cancer treatments with electronic medical records. JCO Clin Cancer Inform 5:379–393. https://doi.org/10.1200/cci.20.00173
Glaser AP, Jordan BJ, Cohen J, Desai A, Silberman P, Meeks JJ (2018) Automated extraction of grade, stage, and quality information from transurethral resection of bladder tumor pathology reports using natural language processing. JCO Clin Cancer Inform 2:1–8. https://doi.org/10.1200/cci.17.00128
Kehl KL, Xu W, Lepisto E, Elmarakeby H, Hassett MJ, Van Allen EM et al (2020) Natural language processing to ascertain cancer outcomes from medical oncologist notes. JCO Clin Cancer Inform 4:680–690. https://doi.org/10.1200/cci.20.00020
Acknowledgements
The authors would like to thank the past and present members of the POWH Head & Neck Research team for their hard work and dedication towards the conception and curation of the head and neck cancer research dataset. The authors would also like to thank Peter Geelan-Smith from Stats Central, UNSW Sydney, for assistance with statistical analysis. Support for this work was provided by the Australian Government Research Training Program Scholarship. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions. The authors did not receive support from any organization for the submitted work.
Author information
Authors and Affiliations
Contributions
Conceptualization: all authors; Methodology: Damian Kotevski, Matthew Field, Kathryn Broadley; Formal analysis and investigation: Damian Kotevski; Writing – original draft preparation: Damian Kotevski, Claire Vajdic; Writing – review and editing: all authors.
Corresponding author
Ethics declarations
Ethics approval
Approval was obtained from the ethics committee NSW Population and Health Services Research Ethics Committee (2019/ETH12196). The procedures used in this study adhere to the tenets of the Declaration of Helsinki.
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kotevski, D.P., Smee, R.I., Field, M. et al. The Utility of Oncology Information Systems for Prognostic Modelling in Head and Neck Cancer. J Med Syst 47, 9 (2023). https://doi.org/10.1007/s10916-023-01907-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10916-023-01907-6