Skip to main content

Significant patterns for oral cancer detection: association rule on clinical examination and history data


This paper presents an application of data mining in healthcare and discusses how the generated patterns can be used by physicians for early detection and hence prevention of oral cancer. One of the popular association rule mining algorithms, Apriori is used to extract a set of significant rules from the data pertaining to clinical examination, history, and survivability of the cancer patients. These rules suggest various investigations and also help predicting distribution of cancer in oral cavity. In spite of the fact that the clinical judgment happens by means of examination of the oral cavity and tongue using various diagnostic tools, the majority of cases present to a healthcare setups at later stages of tumor subtypes, thereby lessening the chances of survival due to delay in diagnosis. Nevertheless, the data mining rules would certainly assist the practitioners in early detection of oral cancer and prediction of distribution of cancer in the oral cavity that can be helpful preventing the disease. The experimental results demonstrate that all the generated rules hold the highest confidence level, thereby making them useful for early detection and prevention of the oral cancer.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11


  • Abual-Rub MS, Al-Betar MA, Abdullah R, Khader AT (2012) A hybrid harmony search algorithm for ab initio protein tertiary structure prediction. Netw Model Anal Health Inform Bioinforma 1(3):69–85. doi:10.1007/s13721-012-0013-7

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 ACM SIGMOD international conference on management of data, pp 207–216

  • Agrawal M, Pandey S, Jain S, Maitin S (2012) Oral cancer awareness of the general public in Gorakhpur City, India. Asian Pac J Cancer Prev 13:5195–5199

    Article  Google Scholar 

  • An J, Chen YPP, Chen H (2005) DDR: an index method for large time series datasets. Inf Syst 30:333–348

    Article  Google Scholar 

  • Andrea LH, Hsinchun C, Susan MH, Bruce RS, Tobun DN, Robin RS, Kristin MT (1999) Medical data mining on the internet: research on a cancer information system. Artif Intell Rev 13:437–466

    Article  Google Scholar 

  • Anh TN, Hai DV, Tin TC, Bac LH (2011) Efficient algorithms for mining frequent itemsets with constraint. In: Proceedings of the third international conference on knowledge and systems engineering

  • Anuradha K, Sankaranarayanan K (2012) Identification of suspicious regions to detect Oral cancers at an earlier stage: a literature survey.In: Proceedings of International Journal of Advances in Engineering and Technology 03, 01 March 2012, pp 84–91

  • Bayardo RJ, Agrawal R, Gunopulos D (2000) Constraint-based rule mining in large, dense databases. Data Min Knowl Discov 4(2–3):217–240

    Google Scholar 

  • Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD 1997), Tucson, Arizona, USA. May 1997, pp 265–276

  • Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD international conference on management of data (SIGMOD 1997), Tucson, Arizona, USA. May 1997, pp 255–264

  • Chen YPP, Chen F (2008) Targets for drug discovery using bioinformatics. Expert Opin Ther Targets 12(04):383–389

    Article  Google Scholar 

  • Chuang LY, Wu KC, Chang HW, Yang CH (2011) Support vector machine-based prediction for oral cancer using four snps in DNA repair genes. In: Proceedings of the International MultiConference of Engineers and Computer scientists. 16–18 March 2011

  • Clifton C (2010) Encyclopædia Britannica: definition of data mining

  • Coelho KR (2012) Challenges in oral cancer burden in India. J Cancer Epidemiol 2012:701932

    Article  Google Scholar 

  • Cong G, Liu B (2002) Speed-up iterative frequent itemset mining with constraint changes. In: Proceedings of IEEE International Conference on Data Mining (ICDM ′02), pp 107–114

  • Data Mining Curriculum. ACM SIGKDD (2006) 2006-04-30

  • Elango JK, Gangadharan P, Sumithra S, Kuriakose MA (2006) Trends of head and neck cancers in urban and rural India. Asian Pac J Cancer Prev 07(01):108–112

    Google Scholar 

  • Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996a) From data mining to knowledge discovery: an overview. Advances in knowledge discovery and data mining (AAAI Press/MIT Press), pp 1–36

  • Fayyad UM, Piatetsky-Shapiro G, Smyth P (1996b) From data mining to knowledge discovery in databases. American Association for Artificial Intelligence (AAAI-AI Magazine), pp 37–54

  • Gadewal NS, Zingde SM (2011) Database and interaction network of genes involved in oral cancer: version II. Bioinformation 06(04):169–170

    Article  Google Scholar 

  • Han J, Kamber M, Pei J (2011) Data Mining: concepts and techniques. Morgan Kaufmann Publishers, Third Edison. ISBN 9780123814791

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. In: LNCS, vol. 5808, 2nd edn. Springer, New York, pp 66–79

  • Hen LE, Lee SP (2008) Performance analysis of data mining tools cumulating with a proposed data mining middleware. J Comput Sci 4(10):826–833 Science Publication

    Article  Google Scholar 

  • Hipp J, Güntzer U, Nakhaeizadeh G (2000) Algorithms for association rule mining: a general survey and comparison. ACM SIGKDD Explor Newslett 2:58. doi:10.1145/360402.360421

    Article  Google Scholar 

  • Hou J, Zhu W, Chen YP (2013) Dynamically predicting protein functions from semantic associations of proteins. Netw Model Anal Health Inform Bioinforma 2(4):175–183. doi:10.1007/s13721-013-0024-z

    Google Scholar 

  • Jemal A, Thimas A, Murray T, Thun M (2002) Cancer statistics, CA. Cancer J Clin 52:181–182

    Article  Google Scholar 

  • Kaladhar DSVGK, Chandana B, Kumar PB (2011) Predicting cancer survivability using Classification algorithms. In: Proceedings of International Journal of Research and Reviews in Computer Science (IJRRCS). 02, 02 April 2011, 340–343

  • Kent S (1996) Diagnosis of oral cancer using genetic programming: a technical report. CSTR pp 96–14

  • Khandekar PS, Bagdey PS, Tiwari RR (2006) Oral cancer and Some epidemiological factors: a hospital based study. Indian J Commun Med 31(03):157–159

    Google Scholar 

  • Khosla R, Dillon T (1997) Knowledge discovery, data mining and hybrid systems. In: Engineering intelligent hybrid multi-agent systems, Kluwer Academic Publishers, pp 143–177

  • Lau RYK, Tang M, Wong O, Milliner SW, Chen YPP (2006) An evolutionary learning approach for adaptive negotiation agents. Int J Intell Syst 21(01):41–72

    Article  MATH  Google Scholar 

  • Lee AJ, Lin WC, Wang CS (2006) Mining association rules with multi-dimensional constraints. J Syst Soft 79(1):79–92

    Google Scholar 

  • Manoharan N, Tyagi BB, Raina V (2010) Cancer incidences in rural Delhi, 2004–05. Asian Pac J Cancer Prev 11(01):73–78

    Google Scholar 

  • Milovic B, Milovic M (2012) Prediction and decision making in health care using data mining. In: Proceedings of international Journal of Public Health Science. 01, 02 Dec 2012, 69–78

  • Nagi S, Bhattacharyya DK (2013) Classification of microarray cancer data using ensemble approach. Netw Model Anal Health Inform Bioinforma 2(3):159–173. doi:10.1007/s13721-013-0034-x

  • Nahar J, Kevin ST, Ali ABMS, Chen YP (2011) Significant cancer prevention factor extraction: an association rule discovery approach. J Med Syst 35(3):353–367. doi:10.1007/s10916-009-9372-8

    Article  Google Scholar 

  • Nguyen RT, Lakshman VS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained association rules. In: Proceedings of international conference on management of data, ACM-SIG-MOD, 13–24

  • Ordonez C (2006) Association rule discovery with the train and test approach for heart disease prediction. IEEE Trans Inf Technol Biomed 10(02):334–343

    Article  Google Scholar 

  • Ordonez C, Omiecinski E (1999) Discovering association rules based on image content. In: Proceedings of IEEE Advances in Digital Libraries Conference (ADL’99), pp 38–49

  • Ordonez C, Santana CA, Braal L (2000) Discovering interesting association rules in medical data. In: ACM DMKD workshop, pp 78–85

  • Piatetsky-Shapiro G (1991) Discovery, analysis, and presentation of strong rules. Knowledge discovery in databases. AAAI/MIT Press, Cambridge, pp 229–248

    Google Scholar 

  • Sankaranarayanan R, Ramadas K, Thomas K (2005) Effect of screening on oral cancer mortality in Kerala, India: a cluster-randomised controlled trial. Lancet 365(9475):1927–1933

    Article  Google Scholar 

  • Scully C, Bagan JV, Hopper C, Epstein JB (2008) Oral cancer: current and future diagnostics techniques: a review article. Am J Dent 21:199–209

    Google Scholar 

  • Sharma N, Om Hari (2012) Framework for early detection and prevention of oral cancer using data mining. Int J Adv Eng Technol 4(2):302–310

    Google Scholar 

  • Sharma N, Hari Om (2013) Data mining models for predicting oral cancer survivability. Network modeling analysis in health informatics and bioinformatics, Springer, vol 2, Issue 5, pp 285–295, doi:10.1007/s13721-013-0045-7

  • Singh S, Yadav M, Gupta H (2012) Finding the chances and prediction of cancer through Apriori algorithm with transaction reduction. Int J Adv Comput Res 2(2):23–28 (ISSN (print): 2249-7277 ISSN (online):2277–7970

    Article  Google Scholar 

  • Srikant R, Vu Q, Agrawal R (1997) Mining association rules with item constraints. In: Proceedings of KDD97, 67–73

  • Swami S, Thakur RS, Chande lRS (2011) Multi- dimensional association rules extraction in smoking habits database. Int J Adv Netw Appl 03(03):1176–1179

    Google Scholar 

  • Tang J, Chuang L, Hsi E, Lin Y, Yang C, Chang H (2013) Identifying the association rules between clinicopathologic factors and higher survival performance in operation-centric oral cancer patients using the Apriori algorithm. BioMed Research International, vol 2013:359634. doi:10.1155/2013/359634

  • Werning JW (2007) Oral cancer: diagnosis, management, and rehabilitation. ISBN 978-1588903099, 16 May 2007

  • Witten IH, Frank E (2005) Data mining: practical machine learning tool and techniques. In: Morgan Kaufmann Series in Data Management Systems, 2nd edn. Elsevier, Amsterdam

  • Woolgar JA, Scott J, Vaughan ED, Brown JS, West CR, Rogers S (1995) Survival, metastasis and recurrence of oral cancer in relation to pathological features. Ann R Coll Surg Engl 1995(77):325–331

    Google Scholar 

  • Zaki MJ (2004) Mining non-redundant association rules. Data Min Knowl Disc 09:223–248

    Article  MathSciNet  Google Scholar 

Download references


The authors would like to thank the management and staff of Indian School of Mines, for their constant support and motivation.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Neha Sharma.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Sharma, N., Om, H. Significant patterns for oral cancer detection: association rule on clinical examination and history data. Netw Model Anal Health Inform Bioinforma 3, 50 (2014).

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI:


  • Data mining
  • Association rule mining
  • Apriori
  • Oral cancer
  • WEKA