Abstract
This article proposes a simple nonparametric measure for diagnosis of cancer and assessing cancer intensity for an individual, without resorting to group data or reduction of dimensionality or scaling or finding weights. The measure also identifies the critical areas/variables requiring attention, can be applied for all non-nominal data, can be used to find mean, variance, and confidence interval for group data, and facilitates statistical tests of hypothesis.
The cancer intensity facilitates ranking/classifying a group of patients along with quantifying progress of treatment at individual and group level. Using suitably designed group data, attempt can be made to find a small interval of values of cancer intensity for each type of cancer, which may be associated with Stage IV cancer or metastatic cancer. The proposed measure of cancer intensity offers an alternative approach for estimation of survival function of cancer patients. This study leads to a number of new areas of statistical analysis in cancer treatment. An empirical study will be of vital interest based on this theoretical study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alf EF, Grossberg JM (1979) The geometric mean: confidence limits and significance tests. Percept Psychophys 26(5):419–421
Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and Markov blanket induction for causal discovery and feature selection for classification Part I: Algorithms and empirical evaluation. J Mach Learn Res 11(1):171–234
Berkson J, Gage RP (1950) Calculation of survival rates for cancer. In: Proceedings of the staff meetings. Mayo Clinic, 25(11), 270–286
Breslow NE, Day NE, Schlesselman JJ (1982) Statistical methods in cancer research. Volume 1—the analysis of case-control studies. J Occup Environ Med 24(4):255–257
Chakrabartty SN (2014) Scoring and analysis of Likert scale: few approaches. J Knowl Manage Inform Technol 1(2)
Chakrabartty SN (2018) Better composite environmental performance index. Interdiscip Environ Rev 19(2):139–152
Collett D (2003) Modeling of survival data in medical research. Chapman Hall, London, UK
Ebert U, Welsch H (2004) Meaningful environmental indices: a social choice approach. J Environ Econ Manage 47(2):270–283
Gehan EA (1969) Estimating survival functions from the life table. J Chronic Dis 21(9–10):629–644
Jamieson S (2004) Likert scales: how to (ab) use them. Med Educ 38:1212–1218
Jan B, Shah SWA, Shah S, Qadir MF (2005) Weighted Kaplan Meier estimation of survival function in heavy censoring. Pak J Stat 21(1):55–63
Jiang H, An L, Baladandayuthapani V, Auer PL (2014) Classification, predictive modeling, and statistical analysis of cancer data (a). Cancer Inform 13(2):1–3
Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Statist Assoc 53(282):457–481
Norris N (1940) The standard errors of the geometric and harmonic means and their application to index numbers. Ann Math Statist 11(4):445–448
Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci 98(26):15149–15154
Shafiq M, Shah S, Alamgir M (2007) Modified weighted Kaplan-Meier estimator. Pak J Statist Oper Res 3(1):39–44
Shi Q, Sargent DJ (2015) Key statistical concepts in cancer research. Clin Adv Hematol Oncol: H&O 13(3):180–185
Sprangers MAG, Cull A, Bjordal K, Groenvold M, Aaronson NK (1993) The European Organization for Research and Treatment of cancer approach to quality of life assessment: guidelines for developing questionnaire modules. Qual Life Res 2(4):287–295
Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinform 2
Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW (2005) Gene selection from microarray data for cancer classification—a machine learning approach. Comput Biol Chem 29(1):37–46
Wu B, Abbott T, Fishman D, McMurray W, Mor G, Stone K, Ward D, Williams K, Zhao H (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19(13):1636–1643
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Statistical Notes
-
1.
Chi-square test is a nonparametric test that makes comparisons (usually of cross tabulated data) between two or more samples on the observed frequency of values with expected frequency of values and also, used as test of goodness of fit of log linear models.
-
2.
Regression analysis establishes relationship of the dependent variable with one or more of the independent variables.
-
3.
Logistic regression analysis deals with dependent variable in binary and one or more independent variables in nominal, ordinal, interval, or ratio level.
-
4.
Factor Analysis (FA)/Principal Component Analysis (PCA) are both multivariate statistical techniques for reduction of data/variables. PCA considers linear combination of weighted observed variables to minimize the variance of the observed variables, while FA explains the covariance between the variables.
-
5.
Wilks’ Lambda is the ratio of the within group sum of squares to the total sum of squares. When observed group means are nearly equal, Wilks’ Lambda will be high and a small lambda occurs when group means differ.
-
6.
A Bayesian network consists of a set of nodes (random variables) and a set of directed edges (direct dependencies between the variables). Major difficulties are specifying prior probability (a priori probability) and computing a posterior probabilities (a posteriori probability).
-
7.
The k-nearest neighbors (KNN) algorithm is used primarily in classification problems.
-
8.
Neural network starts with a set of variables Xi and associated weights Wi for all i = 1, 2, …, n. A function f is determined whose domain is the sums of the weights and range is an output Y. Neural Networks, which have multiple solutions associated with local minima, may not be robust over different samples.
-
9.
Nearest shrunken centroid classification calculates a standardized centroid for each class in terms of ratio of average gene expression for each gene and the within-class standard deviation for that gene.
-
10.
Random forest or random decision forest is a machine learning algorithm used for classification, regression, and other tasks by constructing multitude of ensemble of decision trees and merging them together for more accurate and stable prediction.
-
11.
Support vector machine (SVM) is a nonparametric method for analysis consisting of both classification of tissue samples and explorations of the data for mislabeled or questionable tissue results. Here, the marginal contribution of each component ratio to the score is variable. Moreover, the choice of the input variables has a decisive influence on the performance results.
-
12.
Cluster analysis groups a set of objects in such a way that objects in the same group are more similar to each other than to those in the other groups.
-
13.
When the dependent variable is categorical and the independent variables are in interval scale or in ratio scale, Discriminant analysis develops Discriminant functions (df) that discriminates between the categories of the dependent variables.
-
14.
Log-rank test compares the survival distributions of two samples when the data are right skewed and censored.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Chakrabartty, S.N., Talukdar, G.C. (2022). Statistics in Cancer: Diagnosis, Disease Progression, Treatment Efficacy, and Patient Survival Studies. In: Basu, S.K., Panda, C.K., Goswami, S. (eds) Cancer Diagnostics and Therapeutics . Springer, Singapore. https://doi.org/10.1007/978-981-16-4752-9_22
Download citation
DOI: https://doi.org/10.1007/978-981-16-4752-9_22
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4751-2
Online ISBN: 978-981-16-4752-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)