Skip to main content

Prediction of Child Tumours from Microarray Gene Expression Data Through Parallel Gene Selection and Classification on Spark

  • Conference paper
  • First Online:
Computational Intelligence in Data Mining

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 556))

Abstract

Microarray gene expression data play a major role in predicting chronic disease at an early stage. It also helps to identify the most appropriate drug for curing the disease. Such microarray gene expression data is huge in volume to handle. All gene expressions are not necessary to predict a disease. Gene selection approaches pick only genes that play a prominent role in detecting a disease and drug for the same. In order to handle huge gene expression data, gene selection algorithms can be executed in parallel programming frameworks such as Hadoop Mapreduce and Spark. Paediatric cancer is a threatening illness that affects children at age of 0–14 years. It is very much necessary to identify child tumours at early stage to save the lives of children. So the authors investigate on paediatric cancer gene data to identify the optimal genes that cause cancer in children. The authors propose to execute parallel Chi-Square gene selection algorithm on Spark, selected genes are evaluated using parallel logistic regression and support vector machine (SVM) for Binary classification on Spark Machine Learning library (Spark MLlib) and compare the accuracy of prediction and classification respectively. The results show that parallel Chi-Square selection followed by parallel logistic regression and SVM provide better accuracy compared to accuracy obtained with complete set of gene expression data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shomona Gracia Jacob, Dr.R.Geetha Ramani, P.Nancy: Feature Selection and Classification in Breast Cancer Datasets through Data Mining Algorithms, In Proceedings of the IEEE International Conference on Computational Intelligence and Computing Research (ICCIC’2011), Kanyakumari, India, IEEE Catalog Number: CFP1120J-PRT, ISBN: 978-1-61284-766-5. (2011). 661–667.

    Google Scholar 

  2. Masih, Shraddha, and Sanjay Tanwani: Data Mining Techniques in Parallel and Distributed Environment-A Comprehensive Survey. In International Journal of Emerging Technology and Advanced Engineering (March 2014), Vol. 4, Issue 3, (2014) 453–461.

    Google Scholar 

  3. Pakize., Seyed Reza and Abolfazl Gandomi: Comparative Study of Classification Algorithms Based on MapReduce Model. In International Journal of Innovative Research in Advanced Engineering (2014), ISSN (2014): 2349–2163.

    Google Scholar 

  4. Jacob, S.G. and Ramani, R.G.: Data mining in clinical data sets: a review. training, 4(6). (2012).

    Google Scholar 

  5. Yeh, J.Y: Applying data mining techniques for cancer classification on gene expression data. In Cybernetics and Systems: An International Journal, 39(6), (2008). 583–602.

    Google Scholar 

  6. Shomona Gracia Jacob, Dr.R.Geetha Ramani, Nancy.P: Classification of Splice Junction DNA sequence data through Data mining techniques, ICFCCT, 2012, held at Beijing, China, May 19–20, ISBN:978-988-15121-4-7, (2012). 143–148.

    Google Scholar 

  7. Jirapech-Umpai, T. and Aitken, S.: Feature selection and classification for microarray data analysis: Evolutionary methods for identifying predictive genes. In BMC bioinformatics, 6(1), (2005).148.

    Google Scholar 

  8. Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M. and Yakhini, Z.: Tissue classification with gene expression profiles. In Journal of computational biology, 7(3–4), (2000). 559–583.

    Google Scholar 

  9. Piatetsky-Shapiro, G. and Tamayo, P.: Microarray data mining: facing the challenges. In ACM SIGKDD Explorations Newsletter, 5(2), (2003).1–5.

    Google Scholar 

  10. Lavanya, D. and Rani, D.K.U.: Analysis of feature selection with classification: Breast cancer datasets. Indian Journal of Computer Science and Engineering (IJCSE), 2(5), (2011), 756–763.

    Google Scholar 

  11. Lavanya, D. and Rani, K.U:. Ensemble decision tree classifier for breast cancer data. In International Journal of Information Technology Convergence and Services, 2(1), (2012).17.

    Google Scholar 

  12. Vanaja, S. and Kumar, K.R.: Analysis of feature selection algorithms on classification: a survey. In International Journal of Computer Applications, 96(17) (2014).

    Google Scholar 

  13. Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C.H., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J.P. and Poggio, T:. Multiclass cancer diagnosis using tumour gene expression signatures. In Proceedings of the National Academy of Sciences, 98(26), (2001).15149–15154.

    Google Scholar 

  14. Rajeswari K, Vaithiyanathan, V. and Pede, S.V:. Feature selection for classification in medical data mining. In International Journal of Emerging Trends and Technology in Computer Science (IJETTCS), 2(2), (2013). 492–7.

    Google Scholar 

  15. Devi, M.A. and Sarma, D.D., Comparison of Clustering Algorithms with Feature Selection on Breast Cancer Dataset. In Journal of Innovation in Computer Science and Engineering, (2015).59–63.

    Google Scholar 

  16. Wang, X. and Gotoh, O:. A robust gene selection method for microarray-based cancer classification. In Cancer informatics, 9, (2010).15–30.

    Google Scholar 

  17. Hassanien, A.E: Classification and feature selection of breast cancer data based on decision tree algorithm. In Studies in Informatics and Control, 12(1), (2003). 33–40.

    Google Scholar 

  18. Zhang, H., Li, L., Luo, C., Sun, C., Chen, Y., Dai, Z. and Yuan, Z:. Informative gene selection and direct classification of tumour based on chi-square test of pairwise gene interactions. In BioMed research international, (2014).

    Google Scholar 

  19. Nguyen, C., Wang, Y. and Nguyen, H.N.: Random forest classifier combined with feature selection for breast cancer diagnosis and prognostic. In Journal of Biomedical Science and Engineering, 6(5), (2013).551.

    Google Scholar 

  20. Peralta, D., del Río, S., Ramírez-Gallego, S., Triguero, I., Benitez, J.M. and Herrera, F”. Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach. Mathematical Problems in Engineering, 501, (2015), 246139.

    Google Scholar 

  21. Islam, A.T., Jeong, B.S., Bari, A.G., Lim, C.G. and Jeon, S.H: MapReduce based parallel gene selection method. Applied Intelligence, 42(2), (2015), 147–156.

    Google Scholar 

  22. Begum, S., Chakraborty, D. and Sarkar, R: Cancer classification from gene expression based microarray data using SVM ensemble. In 2015 International Conference on Condition Assessment Techniques in Electrical Systems (CATCON) IEEE (2015), 13–16.

    Google Scholar 

  23. Jeyachidra, J. and Punithavalli, M: February. A comparative analysis of feature selection algorithms on classification of gene microarray dataset. In Information Communication and Embedded Systems (ICICES), 2013 IEEE International Conference on (2013), 1088–1093.

    Google Scholar 

  24. http://spark.apache.org/mllib/.

  25. http://www.biolab.si/supp/bi-cancer/projections/info/EWSGSE967.htm.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Y. V. Lokeswari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Lokeswari, Y.V., Jacob, S.G. (2017). Prediction of Child Tumours from Microarray Gene Expression Data Through Parallel Gene Selection and Classification on Spark. In: Behera, H., Mohapatra, D. (eds) Computational Intelligence in Data Mining. Advances in Intelligent Systems and Computing, vol 556. Springer, Singapore. https://doi.org/10.1007/978-981-10-3874-7_62

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-3874-7_62

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-3873-0

  • Online ISBN: 978-981-10-3874-7

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics