Target Gene Mining Algorithm Based on gSpan

  • Liangfu Lu
  • Xiaoxu Ren
  • Lianyong QiEmail author
  • Chenming Cui
  • Yichen Jiao
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 268)


In recent years, the focus of bioinformatics research has turned to biological data processing and information extraction. New mining algorithm was designed to mine target gene fragment efficiently from a huge amount of gene data and to study specific gene expression in this paper. The extracted gene data was filtered in order to remove redundant gene data. Then the binary tree was constructed according to the Pearson correlation coefficient between gene data and processed by gSpan frequent subgraph mining algorithm. Finally, the results were visually analyzed in grayscale image way which helped us to find out the target gene. Compared with the existing target gene mining algorithms, such as integrated decision feature gene selection algorithm, our approach enjoys the advantages of higher accuracy and processing high-dimensional data. The proposed algorithm has sufficient theoretical basis, not only makes the results more efficient, but also makes the possibility of error results less. Moreover, the dimension of the data is much higher than the dimension of the data set used by the existing algorithm, so the algorithm is more practical.


gSpan gene mining algorithm Gene expression data Data mining Visual analysis 



This work was partially supported by the National Natural Science Foundation of China under No. 51877144 and No. 61872219.


  1. 1.
    Michihiro, K., George, K.: Gene classification using expression profiles: a feasibility study. Int. J. Artif. Intell. Tools 14(04), 641–660 (2001)Google Scholar
  2. 2.
    Lee, I., Blom, U.M., Wang, P.I., et al.: Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 21(7), 1109 (2011)CrossRefGoogle Scholar
  3. 3.
    Sabau, G., Bologa, R., Bologa, R., et al.: Collaborative network for the development of an informational system in the SOA context for the university management. In: International Conference on Computer Technology and Development, pp. 307–311. IEEE (2009)Google Scholar
  4. 4.
    Shuman, J., Twombly, J.: Collaborative Business. In: Collaborative Networks Are The Organization: An Innovation in Organization Design and Management, 8 vols. The Rhythm of Business, Inc., Newton (2009)Google Scholar
  5. 5.
    Alon, U., Barkai, N., Nootterman, D.A., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Science 96(12), 6745–6750 (1999)Google Scholar
  6. 6.
    Jie, Z., Cheng-quan, G., Jun-rong, C., Li-xin, G.: Tumor identification based on gene expression profiles and the search about extraction of the feature genes. Math. Pract. Theory 41(14), 67–79 (2011)Google Scholar
  7. 7.
    Ya-ning, Z., Yan-hui, Z.: Extraction of tumor gene and its classification based on SNR. J. Xiangfan Univ. 32(8), 13–16 (2011)Google Scholar
  8. 8.
    Quan-jin, L., Ying-xin, L., Xiao-gang, R.: Cancer information gene identification based on statistical method. J. Beijing Univ. Technol. 31(2), 122–125 (2005)Google Scholar
  9. 9.
    Yongxiu, C.: Understanding of correlation coefficient (7), 15–19 (2011)Google Scholar
  10. 10.
    Hong-bin, L., Guang-zhong, H., Qiu-ting, G.: Similarity retrieval method of organic mass spectrometry based on the Pearson correlation coefficient. Chem. Anal. Meterage 24(3), 33–37 (2015)Google Scholar
  11. 11.
    Niyogi, X.: Locality preserving projections. In: Neural Information Processing Systems, vol. 16, p. 153 (2004)Google Scholar
  12. 12.
    Yong-chao, W.: A novel D-S combination method of conflicting evidences based on pearson correlation coefficient. Telecommun. Eng. 52(4), 466–471 (2012)Google Scholar
  13. 13.
    Jie, L., Li-jun, D., Sheng-nan, T.: Refinement procedure for Eigen genes of colon carcinoma based on BB-SIR. World SCI-Tech R&D 33(4), 588–591 (2011)Google Scholar
  14. 14.
    Shoujue, W., Lingfei, Z.: Gene selection for gene expression data analysis. Micro Comput. Inf. 24(3–3), 193–194 (2008)Google Scholar
  15. 15.
    Jing-jing, S., Li-bo, W., Wei, L.: Gene selection for cancer diagnosis. Comput. Eng. Appl., 218–220 (2010)Google Scholar
  16. 16.
    Jun, W.: Method of effective DNA microarray data feature extraction. Modern Electron. Tech. 37(13), 95–98 (2014)Google Scholar
  17. 17.
    Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: ICDM. IEEE (2002)Google Scholar
  18. 18.
    Lin, T.H., Lin, C.H., Pan, T.M.: The implication of probiotics in the prevention of dental caries. Appl. Microbiol. Biotechnol. 102(2), 577–586 (2018)CrossRefGoogle Scholar
  19. 19.
    Philip, N., Suneja, B., Walsh, L.J.: Ecological approaches to dental caries prevention: paradigm shift or Shibboleth? Caries Res. 52(1–2), 153–165 (2018)CrossRefGoogle Scholar
  20. 20.
    Liu, H., Bebu, I., Li, X.: Microarray probes and probe sets. Front. Biosci. 2(1), 325 (2010)CrossRefGoogle Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2019

Authors and Affiliations

  • Liangfu Lu
    • 1
  • Xiaoxu Ren
    • 1
  • Lianyong Qi
    • 2
    Email author
  • Chenming Cui
    • 3
  • Yichen Jiao
    • 3
  1. 1.School of MathematicsTianjin UniversityTianjinChina
  2. 2.School of Information Science and EngineeringQufu Normal UniversityRizhaoChina
  3. 3.School of SoftwareTianjin UniversityTianjinChina

Personalised recommendations