Significant patterns for oral cancer detection: association rule on clinical examination and history data

Original Article


This paper presents an application of data mining in healthcare and discusses how the generated patterns can be used by physicians for early detection and hence prevention of oral cancer. One of the popular association rule mining algorithms, Apriori is used to extract a set of significant rules from the data pertaining to clinical examination, history, and survivability of the cancer patients. These rules suggest various investigations and also help predicting distribution of cancer in oral cavity. In spite of the fact that the clinical judgment happens by means of examination of the oral cavity and tongue using various diagnostic tools, the majority of cases present to a healthcare setups at later stages of tumor subtypes, thereby lessening the chances of survival due to delay in diagnosis. Nevertheless, the data mining rules would certainly assist the practitioners in early detection of oral cancer and prediction of distribution of cancer in the oral cavity that can be helpful preventing the disease. The experimental results demonstrate that all the generated rules hold the highest confidence level, thereby making them useful for early detection and prevention of the oral cancer.


Data mining Association rule mining Apriori Oral cancer WEKA 


  1. Abual-Rub MS, Al-Betar MA, Abdullah R, Khader AT (2012) A hybrid harmony search algorithm for ab initio protein tertiary structure prediction. Netw Model Anal Health Inform Bioinforma 1(3):69–85. doi:10.1007/s13721-012-0013-7
  2. Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp 207–216Google Scholar
  3. Agrawal M, Pandey S, Jain S, Maitin S (2012) Oral cancer awareness of the general public in Gorakhpur City, India. Asian Pac J Cancer Prev 13:5195–5199CrossRefGoogle Scholar
  4. An J, Chen YPP, Chen H (2005) DDR: an index method for large time series datasets. Inf Syst 30:333–348CrossRefGoogle Scholar
  5. Andrea LH, Hsinchun C, Susan MH, Bruce RS, Tobun DN, Robin RS, Kristin MT (1999) Medical data mining on the internet: research on a cancer information system. Artif Intell Rev 13:437–466CrossRefGoogle Scholar
  6. Anh TN, Hai DV, Tin TC, Bac LH (2011) Efficient algorithms for mining frequent itemsets with constraint. In: Proceedings of the third international conference on knowledge and systems engineeringGoogle Scholar
  7. Anuradha K, Sankaranarayanan K (2012) Identification of suspicious regions to detect Oral cancers at an earlier stage: a literature survey.In: Proceedings of International Journal of Advances in Engineering and Technology 03, 01 March 2012, pp 84–91Google Scholar
  8. Bayardo RJ, Agrawal R, Gunopulos D (2000) Constraint-based rule mining in large, dense databases. Data Min Knowl Discov 4(2–3):217–240Google Scholar
  9. Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD 1997), Tucson, Arizona, USA. May 1997, pp 265–276Google Scholar
  10. Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD 1997), Tucson, Arizona, USA. May 1997, pp 255–264Google Scholar
  11. Chen YPP, Chen F (2008) Targets for drug discovery using bioinformatics. Expert Opin Ther Targets 12(04):383–389CrossRefGoogle Scholar
  12. Chuang LY, Wu KC, Chang HW, Yang CH (2011) Support vector machine-based prediction for oral cancer using four snps in DNA repair genes. In: Proceedings of the International MultiConference of Engineers and Computer scientists. 16–18 March 2011Google Scholar
  13. Clifton C (2010) Encyclopædia Britannica: definition of data miningGoogle Scholar
  14. Coelho KR (2012) Challenges in oral cancer burden in India. J Cancer Epidemiol 2012:701932CrossRefGoogle Scholar
  15. Cong G, Liu B (2002) Speed-up iterative frequent itemset mining with constraint changes. In: Proceedings of IEEE International Conference on Data Mining (ICDM ′02), pp 107–114Google Scholar
  16. Data Mining Curriculum. ACM SIGKDD (2006) 2006-04-30Google Scholar
  17. Elango JK, Gangadharan P, Sumithra S, Kuriakose MA (2006) Trends of head and neck cancers in urban and rural India. Asian Pac J Cancer Prev 07(01):108–112Google Scholar
  18. Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996a) From data mining to knowledge discovery: an overview. Advances in knowledge discovery and data mining (AAAI Press/MIT Press), pp 1–36Google Scholar
  19. Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996b) From data mining to knowledge discovery in databases. American Association for Artificial Intelligence (AAAI-AI Magazine), pp 37–54Google Scholar
  20. Gadewal NS, Zingde SM (2011) Database and interaction network of genes involved in oral cancer: version II. Bioinformation 06(04):169–170CrossRefGoogle Scholar
  21. Han J, Kamber M, Pei J (2011) Data Mining: concepts and techniques. Morgan Kaufmann Publishers, Third Edison. ISBN 9780123814791Google Scholar
  22. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. In: LNCS, vol. 5808, 2nd edn. Springer, New York, pp 66–79Google Scholar
  23. Hen LE, Lee SP (2008) Performance analysis of data mining tools cumulating with a proposed data mining middleware. J Comput Sci 4(10):826–833 Science PublicationCrossRefGoogle Scholar
  24. Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining: a general survey and comparison. ACM SIGKDD Explor Newslett 2:58. doi:10.1145/360402.360421 CrossRefGoogle Scholar
  25. Hou J, Zhu W, Chen YP (2013) Dynamically predicting protein functions from semantic associations of proteins. Netw Model Anal Health Inform Bioinforma 2(4):175–183. doi:10.1007/s13721-013-0024-z Google Scholar
  26. Jemal A, Thimas A, Murray T, Thun M (2002) Cancer statistics, CA. Cancer J Clin 52:181–182CrossRefGoogle Scholar
  27. Kaladhar DSVGK, Chandana B, Kumar PB (2011) Predicting cancer survivability using Classification algorithms. In: Proceedings of International Journal of Research and Reviews in Computer Science (IJRRCS). 02, 02 April 2011, 340–343Google Scholar
  28. Kent S (1996) Diagnosis of oral cancer using genetic programming: a technical report. CSTR pp 96–14Google Scholar
  29. Khandekar PS, Bagdey PS, Tiwari RR (2006) Oral cancer and Some epidemiological factors: a hospital based study. Indian J Commun Med 31(03):157–159Google Scholar
  30. Khosla R, Dillon T (1997) Knowledge discovery, data mining and hybrid systems. In: Engineering intelligent hybrid multi-agent systems, Kluwer Academic Publishers, pp 143–177Google Scholar
  31. Lau RYK, Tang M, Wong O, Milliner SW, Chen YPP (2006) An evolutionary learning approach for adaptive negotiation agents. Int J Intell Syst 21(01):41–72CrossRefMATHGoogle Scholar
  32. Lee AJ, Lin WC, Wang CS (2006) Mining association rules with multi-dimensional constraints. J Syst Soft 79(1):79–92Google Scholar
  33. Manoharan N, Tyagi BB, Raina V (2010) Cancer incidences in rural Delhi, 2004–05. Asian Pac J Cancer Prev 11(01):73–78Google Scholar
  34. Milovic B, Milovic M (2012) Prediction and decision making in health care using data mining. In: Proceedings of international Journal of Public Health Science. 01, 02 Dec 2012, 69–78Google Scholar
  35. Nagi S, Bhattacharyya DK (2013) Classification of microarray cancer data using ensemble approach. Netw Model Anal Health Inform Bioinforma 2(3):159–173. doi:10.1007/s13721-013-0034-x
  36. Nahar J, Kevin ST, Ali ABMS, Chen YP (2011) Significant cancer prevention factor extraction: an association rule discovery approach. J Med Syst 35(3):353–367. doi:10.1007/s10916-009-9372-8 CrossRefGoogle Scholar
  37. Nguyen RT, Lakshman VS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained association rules. In: Proceedings of international conference on management of data, ACM-SIG-MOD, 13–24Google Scholar
  38. Ordonez C (2006) Association rule discovery with the train and test approach for heart disease prediction. IEEE Trans Inf Technol Biomed 10(02):334–343CrossRefGoogle Scholar
  39. Ordonez C, Omiecinski E (1999) Discovering association rules based on image content. In: Proceedings of IEEE Advances in Digital Libraries Conference (ADL’99), pp 38–49Google Scholar
  40. Ordonez C, Santana CA, Braal L (2000) Discovering interesting association rules in medical data. In: ACM DMKD workshop, pp 78–85Google Scholar
  41. Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. Knowledge discovery in databases. AAAI/MIT Press, Cambridge, pp 229–248Google Scholar
  42. Sankaranarayanan R, Ramadas K, Thomas K (2005) Effect of screening on oral cancer mortality in Kerala, India: a cluster-randomised controlled trial. Lancet 365(9475):1927–1933CrossRefGoogle Scholar
  43. Scully C, Bagan JV, Hopper C, Epstein JB (2008) Oral cancer: current and future diagnostics techniques: a review article. Am J Dent 21:199–209Google Scholar
  44. Sharma N, Om Hari (2012) Framework for early detection and prevention of oral cancer using data mining. Int J Adv Eng Technol 4(2):302–310Google Scholar
  45. Sharma N, Hari Om (2013) Data mining models for predicting oral cancer survivability. Network modeling analysis in health informatics and bioinformatics, Springer, vol 2, Issue 5, pp 285–295, doi:10.1007/s13721-013-0045-7
  46. Singh S, Yadav M, Gupta H (2012) Finding the chances and prediction of cancer through Apriori algorithm with transaction reduction. Int J Adv Comput Res 2(2):23–28 (ISSN (print): 2249-7277 ISSN (online):2277–7970CrossRefGoogle Scholar
  47. Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of KDD97, 67–73Google Scholar
  48. Swami S, Thakur RS, Chande lRS (2011) Multi- dimensional association rules extraction in smoking habits database. Int J Adv Netw Appl 03(03):1176–1179Google Scholar
  49. Tang J, Chuang L, Hsi E, Lin Y, Yang C, Chang H (2013) Identifying the association rules between clinicopathologic factors and higher survival performance in operation-centric oral cancer patients using the Apriori algorithm. BioMed Research International, vol 2013:359634. doi:10.1155/2013/359634
  50. Werning JW (2007) Oral cancer: diagnosis, management, and rehabilitation. ISBN 978-1588903099, 16 May 2007Google Scholar
  51. Witten IH, Frank E (2005) Data mining: practical machine learning tool and techniques. In: Morgan Kaufmann Series in Data Management Systems, 2nd edn. Elsevier, AmsterdamGoogle Scholar
  52. Woolgar JA, Scott J, Vaughan ED, Brown JS, West CR, Rogers S (1995) Survival, metastasis and recurrence of oral cancer in relation to pathological features. Ann R Coll Surg Engl 1995(77):325–331Google Scholar
  53. Zaki MJ (2004) Mining non-redundant association rules. Data Min Knowl Disc 09:223–248CrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Wien 2014

Authors and Affiliations

  1. 1.Dr. D.Y. Patil Institute of Master of Computer ApplicationsUniversity of PunePune-44India
  2. 2.Computer Science and Engineering DepartmentIndian School of MinesDhanbadIndia

Personalised recommendations