Dear Editor

We have read with interest the article by Hasegawa and his colleagues [1], comparing oncological outcomes between omentum-preserving gastrectomy and omentectomy for advanced gastric cancer, using propensity score matching to minimize selection bias in the retrospective cohort in a single hospital. Hasegawa and his colleagues tried to adjust for confounding factors by using propensity score matching, and they concluded that omentum-preserving gastrectomy (group P) for advanced gastric cancer did not increase the peritoneal relapse rate or affect patient survival compared to conventional gastrectomy (group R) [1]. However, we have several important concerns about the statistical methods used in this study.

The propensity score has been developed to minimize differences in patients’ covariates that could become confounding factors in the examination of treatment effects in observational studies. It is defined as the subject’s probability of treatment selection, conditional on observed baseline covariates [2]. Propensity score matching will be effective only when properly and ideally conducted. It should be performed to match omentum-resected and omentum-preserving gastrectomy with regard to similarity of patient background based on the preoperative information in this study [3]. However, the authors used only four variables to construct their propensity score: age, gender, p-stage, and extent of lymph node dissection. Hence, they assumed that these factors relate to allocation of patients either to omentum-resected or to omentum-preserving gastrectomy. First, the p-stage was derived from the postoperative pathological information and the region of lymphadenectomy from the operative information. Logically, these factors cannot influence the choice of a surgical procedure. Second, after matching the cases using propensity score based on these variables, some imbalanced distributions of patient characteristics were evident: many old cases were included in group R; laparoscopic surgery and adjuvant chemotherapy were significantly more frequent in group P; and splenectomy was more often performed in group R. Presumably, these imbalances might predispose the outcomes of group P to being better compared to R. In fact, group P was superior to group R in relapse-free survival or overall survival, although the difference was not significant. We are inevitably led to infer that the factors included in propensity matching were not able to remove important selection bias.

We cannot help saying that such analysis is contradictory to the definition of propensity score matching, which predicts “the conditional probability of being treated given the individual’s covariates.” On the other hand, it is possible that oncologic outcomes between two groups may have no significant difference because they matched the cases based on pathological T and N, which is defined to predict oncological prognosis. If the authors wish to compare the difference between two groups based on postoperative pathological results, simple multivariate analysis should be chosen, even though this would not cancel the selection bias in this retrospective cohort.

We believe that more preoperative information should be included in the propensity score analysis [4], such as preoperative comorbidity, clinical information of the tumor including clinical or surgical TNM stage, biopsy findings, and tumor location, because we routinely use this information to decide the type of procedure preoperatively; and, after matching the cases, oncologic outcomes should be compared between the two groups, further taking the pathological TNM stages into account.