Skip to main content

Improving Metagenome Sequence Clustering Application Performance Using Louvain Algorithm

  • Conference paper
  • First Online:
Recent Featured Applications of Artificial Intelligence Methods. LSMS 2020 and ICSEE 2020 Workshops (LSMS 2020, ICSEE 2020)

Abstract

Metagenomic assembly is a very challenging subject due to the huge data volume of next-generation sequencing (NGS). The ability of clustering strategy to handle large amounts of data makes it an ideal solution to memory limitations. SpaRC (Spark Reads Clustering), a scalable sequences clustering tool based on the Apache Spark, a distributed big data analysis platform, provides a solution to cluster hundreds of GBs of sequences from different genomes. However, the Label Propagation Algorithm (LPA) used in SpaRC is usually unstable, causing the clustering results to oscillate and contain too many tiny clusters. In this paper, we proposed a method for clustering metagenomic sequences based on the distributed Louvain algorithm to obtain more accurate clustering results. We performed experiments on two different datasets with millions of genome sequences based on LPA and Louvain, respectively. The experimental results indicate that this approach can effectively improve clustering performance. We hope that the method applied in this paper can be widely used in other metagenomic clustering studies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, K., Lu, Y., Deng, L., Wang, L., Shi, L., Wang, Z.: Deconvolute individual genomes from metagenome sequences through short read clustering. PeerJ 8, e8966 (2020)

    Google Scholar 

  2. Yan, W., Sun, C., Yuan, J., Yang, N.: Gut metagenomic analysis reveals prominent roles of Lactobacillus and cecal microbiota in chicken feed efficiency. Sci. Rep. 28(7), 45308 (2017)

    Google Scholar 

  3. Dong, E., Du, H., Gardner, L.: An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases, 19 February 2020

    Google Scholar 

  4. Hillmann, B., et al.: Evaluating the information content of shallow shotgun metagenomics. Msystems 3(6), e00069–18, 30 October 2018

    Google Scholar 

  5. Sandhya, S., Srivastava, H., Kaila, T., Tyagi, A., Gaikwad, K.: Methods and tools for plant organelle genome sequencing, assembly, and downstream analysis. In: Legume Genomics, Humana, New York, NY, pp. 49–98 (2020). https://doi.org/10.1007/978-1-0716-0235-5_4

  6. Compeau, P.E., Pevzner, P.A., Tesler, G.: Why are de Bruijn graphs useful for genome assembly? Nat. Biotechnol. 29(11), 987 (2011)

    Google Scholar 

  7. Kelley, D.R., Salzberg, S.L.: Clustering metagenomic sequences with interpolated Markov models. BMC Bioinf. 11(1), 544 (2010)

    Google Scholar 

  8. Onate, F.P., Batto, J.M., Juste, C., Fadlallah, J., Fougeroux, C., Gouas, D., Pons, N., Kennedy, S., Levenez, F., Dore, J., Ehrlich, S.D.: Quality control of microbiota metagenomics by k-mer analysis. BMC Genom. 16(1), 1 (2015)

    Google Scholar 

  9. Zou, Q., Lin, G., Jiang, X., Liu, X., Zeng, X.: Sequence clustering in bioinformatics: an empirical study. Brief. Bioinform. 21(1), 1 (2020)

    Google Scholar 

  10. Bao, E., Jiang, T., Kaloshian, I., Girke, T.: SEED: efficient clustering of next-generation sequences. Bioinformatics 27(18), 2502–2509 (2011)

    Google Scholar 

  11. Jokar, E., Mosleh, M.: Community detection in social networks based on improved Label Propagation Algorithm and balanced link density. Phys. Lett. A 383(8), 718–727 (2019)

    MathSciNet  Google Scholar 

  12. Li, W., Huang, C., Wang, M., Chen, X.: Stepping community detection algorithm based on label propagation and similarity. Phys. A 15(472), 145–155 (2017)

    Google Scholar 

  13. Chaudhary, L., Singh, B.: Community detection using an enhanced louvain method in complex networks. In: Fahrnberger, G., Gopinathan, S., Parida, L. (eds.) ICDCIT 2019. LNCS, vol. 11319, pp. 243–250. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-05366-6_20

    Chapter  Google Scholar 

  14. Blondel, V.D., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of communities in large networks. J. Stat. Mech: Theory Exp. 2008(10), P10008 (2008)

    MATH  Google Scholar 

  15. Ghosh, S., Halappanavar, M., Tumeo, A., Kalyanarainan, A.: Scaling and quality of modularity optimization methods for graph clustering. In: 2019 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE, 24 September 2019

    Google Scholar 

  16. Guo, R., Zhao, Y., Zou, Q., Fang, X., Peng, S.: Bioinformatics applications on apache spark. GigaScience, 7(8), giy098, August 2018

    Google Scholar 

  17. Shi, L., Meng, X., Tseng, E., Mascagni, M., Wang, Z.: SpaRC: scalable sequence clustering using Apache Spark. Bioinformatics 35(5), 760–768 (2019)

    Google Scholar 

  18. Chen, D., Yuan, Y., Zhang, R., Huang, X., Wang, D.: A smart weighted-louvain algorithm for community detection in large-scale networks. In: FSDM, pp. 273–281, 6 November 2019

    Google Scholar 

  19. Bascol, K., Emonet, R., Fromont, E., Habrard, A., Metzler, G., Sebban, M.: From cost-sensitive to tight f-measure bounds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1245–1253, 11 April 2019

    Google Scholar 

  20. Wang, Y., Ni, X.S.: A XGBoost risk model via feature selection and Bayesian hyper-parameter optimization. arXiv preprint arXiv:1901.08433 (2019)

Download references

Acknowledgments

This work is supported by National Natural Science Foundation (NNSF) of China under Grant 61802246.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Deng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lu, Y., Deng, L., Wang, L., Li, K., Wu, J. (2020). Improving Metagenome Sequence Clustering Application Performance Using Louvain Algorithm. In: Fei, M., Li, K., Yang, Z., Niu, Q., Li, X. (eds) Recent Featured Applications of Artificial Intelligence Methods. LSMS 2020 and ICSEE 2020 Workshops. LSMS ICSEE 2020 2020. Communications in Computer and Information Science, vol 1303. Springer, Singapore. https://doi.org/10.1007/978-981-33-6378-6_29

Download citation

  • DOI: https://doi.org/10.1007/978-981-33-6378-6_29

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-33-6377-9

  • Online ISBN: 978-981-33-6378-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics