Data Collection, Statistical Analysis and Clustering Studies of Cancer Dataset from Viziayanagaram District, AP, India
Cancer detection is one of major research that can be processed through datasets and data mining techniques. The data has been collected from Vizianagaram district (Village) during 2013 with 328 instances and 28 attributes (Gender, Age, Cancer Type, Family_members, Drinking, Smoking, Tea, Coffee, perfumes, Morning_eat, Travelling, Wake_up, Sleep, Tensions, Cool_drinks, Icecream, Height, weight, hair_loss, Marital, milk, bath, Oil, Fast_food, other diseases, Mobile, Sports, Mosquito_replents). The dataset has been analyzed using weka version 3.6.3and Orange softwares v2.7. The histogram shows higher instances for Lung cancer (56), Mouth (40), Bone (40), Skin (32), and Colon (24). There are more number of instances observed in Males (53.7%) compared with females (46.3%). The disease in married people are more (61%) compared to unmarried (39%) with average age groups observed at 33.78±10.12, Height as 159.02±9.79 cms and weight as 61.55±11.69 Kgs. Nearly 90.2% patients has no other diseases, 136 patients (41.5%) prefer drinking alcohol, 72 patients (22%) prefer smoking, 208(63.4%) prefer drinking tea, 96 (29.3%) prefer drinking coffee, 216(65%) prefer taking rice, 80(24.4%) prefer taking cool drinks, no person like ice creams, 88 (26.8%) prefer taking milk, 238(63.4%) prefer taking sunflower oil in cooking and 68(26.8%) prefer taking fast food. The data shows hair loss, use of mobile phones and mosquito repellents as major factors in cancer. It concludes that Age, Gender, Height, weight, marital status, tea, walking, hairloss, mobile and mosquito repellents are major factors/attributes in cancer occurrence.
KeywordsCancer Statistical analysis Clustering Vizianagaram
Unable to display preview. Download preview PDF.
- 3.Capocaccia, R., Gatta, G., Roazzi, P., Carrani, E., Santaquilani, M., De Angelis, R., Tavilla, A.: The EUROCARE-3 database: methodology of data collection, standardisation, quality control and statistical analysis. Annals of Oncology: Official Journal of the European Society for Medical Oncology/ESMO 14, v14 (2003)Google Scholar
- 4.De Angelis, R., Francisci, S., Baili, P., Marchesi, F., Roazzi, P., Belot, A., Crocettih, E., Puryi, P., Knijnc, A., Colemanj, M., Capocacciaa, R.: The EUROCARE-4 database on cancer survival in Europe: data standardisation, quality control and methods of statistical analysis. European Journal of Cancer 45(6), 909–930 (2009)CrossRefGoogle Scholar
- 5.Gill, S., Loprinzi, C.L., Sargent, D.J., Thomé, S.D., Alberts, S.R., Haller, D.G., Benedetti, J., Francini, G., Shepherd, L.E., Seitz, J.F., Labianca, R., Chen, W., Cha, S.S., Heldebrant, M.P., Heldebrant, R.M.: Pooled analysis of fluorouracil-based adjuvant therapy for stage II and III colon cancer: who benefits and by how much? Journal of Clinical Oncology 22(10), 1797–1806 (2004)CrossRefGoogle Scholar
- 7.Visbal, A.L., Williams, B.A., Nichols III, F.C., Marks, R.S., Jett, J.R., Aubry, M.C., Edell, E.S., Wampfler, J.A., Molina, J.R., Yang, P.: Gender differences in non–small-cell lung cancer survival: an analysis of 4,618 patients diagnosed between 1997 and 2002. The Annals of thoracic surgery 78(1), 209–215 (2004)CrossRefGoogle Scholar
- 10.Kaladhar, D.S.V.G.K., Chandana, B., Kumar, P.B.: Predicting cancer survivability using Classification algorithms. LMT 34(65.7), 96–106 (2011)Google Scholar
- 11.Kaladhar, D.S.V.G.K., Pottumuthu, B.K., Rao, P.V.N., Vadlamudi, V., Chaitanya, A.K., Reddy, R.H.: The Elements of Statistical Learning in Colon Cancer Datasets: Data Mining, Inference and Prediction. Algorithms Research 2(1), 8–17 (2013)Google Scholar
- 12.Donnelly, S., Walsh, D.: The symptoms of advanced cancer. Seminars in Oncology 22(2), 67 (1995)Google Scholar