Skip to main content

Statistics in Cancer: Diagnosis, Disease Progression, Treatment Efficacy, and Patient Survival Studies

  • Chapter
  • First Online:
Cancer Diagnostics and Therapeutics

Abstract

This article proposes a simple nonparametric measure for diagnosis of cancer and assessing cancer intensity for an individual, without resorting to group data or reduction of dimensionality or scaling or finding weights. The measure also identifies the critical areas/variables requiring attention, can be applied for all non-nominal data, can be used to find mean, variance, and confidence interval for group data, and facilitates statistical tests of hypothesis.

The cancer intensity facilitates ranking/classifying a group of patients along with quantifying progress of treatment at individual and group level. Using suitably designed group data, attempt can be made to find a small interval of values of cancer intensity for each type of cancer, which may be associated with Stage IV cancer or metastatic cancer. The proposed measure of cancer intensity offers an alternative approach for estimation of survival function of cancer patients. This study leads to a number of new areas of statistical analysis in cancer treatment. An empirical study will be of vital interest based on this theoretical study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Alf EF, Grossberg JM (1979) The geometric mean: confidence limits and significance tests. Percept Psychophys 26(5):419–421

    Article  Google Scholar 

  • Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD (2010) Local causal and Markov blanket induction for causal discovery and feature selection for classification Part I: Algorithms and empirical evaluation. J Mach Learn Res 11(1):171–234

    Google Scholar 

  • Berkson J, Gage RP (1950) Calculation of survival rates for cancer. In: Proceedings of the staff meetings. Mayo Clinic, 25(11), 270–286

    Google Scholar 

  • Breslow NE, Day NE, Schlesselman JJ (1982) Statistical methods in cancer research. Volume 1—the analysis of case-control studies. J Occup Environ Med 24(4):255–257

    Google Scholar 

  • Chakrabartty SN (2014) Scoring and analysis of Likert scale: few approaches. J Knowl Manage Inform Technol 1(2)

    Google Scholar 

  • Chakrabartty SN (2018) Better composite environmental performance index. Interdiscip Environ Rev 19(2):139–152

    Article  Google Scholar 

  • Collett D (2003) Modeling of survival data in medical research. Chapman Hall, London, UK

    Google Scholar 

  • Ebert U, Welsch H (2004) Meaningful environmental indices: a social choice approach. J Environ Econ Manage 47(2):270–283

    Article  Google Scholar 

  • Gehan EA (1969) Estimating survival functions from the life table. J Chronic Dis 21(9–10):629–644

    Article  CAS  PubMed  Google Scholar 

  • Jamieson S (2004) Likert scales: how to (ab) use them. Med Educ 38:1212–1218

    Article  Google Scholar 

  • Jan B, Shah SWA, Shah S, Qadir MF (2005) Weighted Kaplan Meier estimation of survival function in heavy censoring. Pak J Stat 21(1):55–63

    Google Scholar 

  • Jiang H, An L, Baladandayuthapani V, Auer PL (2014) Classification, predictive modeling, and statistical analysis of cancer data (a). Cancer Inform 13(2):1–3

    PubMed  PubMed Central  Google Scholar 

  • Kaplan EL, Meier P (1958) Nonparametric estimation from incomplete observations. J Am Statist Assoc 53(282):457–481

    Article  Google Scholar 

  • Norris N (1940) The standard errors of the geometric and harmonic means and their application to index numbers. Ann Math Statist 11(4):445–448

    Article  Google Scholar 

  • Ramaswamy S, Tamayo P, Rifkin R, Mukherjee S, Yeang CH, Angelo M, Ladd C, Reich M, Latulippe E, Mesirov JP, Poggio T (2001) Multiclass cancer diagnosis using tumor gene expression signatures. Proc Natl Acad Sci 98(26):15149–15154

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  • Shafiq M, Shah S, Alamgir M (2007) Modified weighted Kaplan-Meier estimator. Pak J Statist Oper Res 3(1):39–44

    Article  Google Scholar 

  • Shi Q, Sargent DJ (2015) Key statistical concepts in cancer research. Clin Adv Hematol Oncol: H&O 13(3):180–185

    Google Scholar 

  • Sprangers MAG, Cull A, Bjordal K, Groenvold M, Aaronson NK (1993) The European Organization for Research and Treatment of cancer approach to quality of life assessment: guidelines for developing questionnaire modules. Qual Life Res 2(4):287–295

    Article  CAS  PubMed  Google Scholar 

  • Tan AC, Gilbert D (2003) Ensemble machine learning on gene expression data for cancer classification. Appl Bioinform 2

    Google Scholar 

  • Wang Y, Tetko IV, Hall MA, Frank E, Facius A, Mayer KF, Mewes HW (2005) Gene selection from microarray data for cancer classification—a machine learning approach. Comput Biol Chem 29(1):37–46

    Article  PubMed  Google Scholar 

  • Wu B, Abbott T, Fishman D, McMurray W, Mor G, Stone K, Ward D, Williams K, Zhao H (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19(13):1636–1643

    Article  CAS  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Statistical Notes

  1. 1.

    Chi-square test is a nonparametric test that makes comparisons (usually of cross tabulated data) between two or more samples on the observed frequency of values with expected frequency of values and also, used as test of goodness of fit of log linear models.

  2. 2.

    Regression analysis establishes relationship of the dependent variable with one or more of the independent variables.

  3. 3.

    Logistic regression analysis deals with dependent variable in binary and one or more independent variables in nominal, ordinal, interval, or ratio level.

  4. 4.

    Factor Analysis (FA)/Principal Component Analysis (PCA) are both multivariate statistical techniques for reduction of data/variables. PCA considers linear combination of weighted observed variables to minimize the variance of the observed variables, while FA explains the covariance between the variables.

  5. 5.

    Wilks’ Lambda is the ratio of the within group sum of squares to the total sum of squares. When observed group means are nearly equal, Wilks’ Lambda will be high and a small lambda occurs when group means differ.

  6. 6.

    A Bayesian network consists of a set of nodes (random variables) and a set of directed edges (direct dependencies between the variables). Major difficulties are specifying prior probability (a priori probability) and computing a posterior probabilities (a posteriori probability).

  7. 7.

    The k-nearest neighbors (KNN) algorithm is used primarily in classification problems.

  8. 8.

    Neural network starts with a set of variables Xi and associated weights Wi for all i = 1, 2, …, n. A function f is determined whose domain is the sums of the weights and range is an output Y. Neural Networks, which have multiple solutions associated with local minima, may not be robust over different samples.

  9. 9.

    Nearest shrunken centroid classification calculates a standardized centroid for each class in terms of ratio of average gene expression for each gene and the within-class standard deviation for that gene.

  10. 10.

    Random forest or random decision forest is a machine learning algorithm used for classification, regression, and other tasks by constructing multitude of ensemble of decision trees and merging them together for more accurate and stable prediction.

  11. 11.

    Support vector machine (SVM) is a nonparametric method for analysis consisting of both classification of tissue samples and explorations of the data for mislabeled or questionable tissue results. Here, the marginal contribution of each component ratio to the score is variable. Moreover, the choice of the input variables has a decisive influence on the performance results.

  12. 12.

    Cluster analysis groups a set of objects in such a way that objects in the same group are more similar to each other than to those in the other groups.

  13. 13.

    When the dependent variable is categorical and the independent variables are in interval scale or in ratio scale, Discriminant analysis develops Discriminant functions (df) that discriminates between the categories of the dependent variables.

  14. 14.

    Log-rank test compares the survival distributions of two samples when the data are right skewed and censored.

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chakrabartty, S.N., Talukdar, G.C. (2022). Statistics in Cancer: Diagnosis, Disease Progression, Treatment Efficacy, and Patient Survival Studies. In: Basu, S.K., Panda, C.K., Goswami, S. (eds) Cancer Diagnostics and Therapeutics . Springer, Singapore. https://doi.org/10.1007/978-981-16-4752-9_22

Download citation

Publish with us

Policies and ethics