Skip to main content

Discovery of Popular Languages from GitHub Repository: A Data Mining Approach

  • Conference paper
  • First Online:
Soft Computing and Signal Processing (ICSCSP 2021)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1413))

Included in the following conference series:

  • 711 Accesses

Abstract

Usage of Open Source Software (OSS) has been increased over the past fifteen years among programmers and computer users. OSS communities work as a “Bazaar” where the project constructors and end-users meet together and search for suitable matches to their skills and requirements. OSS is emerging as a strong competitor to commercial or closed software. GitHub is an OSS forge started in 2008 in order to simplify code sharing. It is a Web site and cloud-based service that aids software developers to store, manage, track, and control changes to their code. When a GitHub project fails, it results in the loss of time, effort, and resources of this large community. The current need is to build models that find interesting factors that contributes to the success of these projects. The massive repositories make this domain a good candidate for exploratory research using the data mining approach. In this work, the FP-Growth method is used to find the popular two programming language combinations and is validated using the SPSS tool. The outcome of this work benefits the OSS community in terms of time and resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. S. Chawla, B. Arunasalam, J. Davis, Mining open source software (OSS) data using association rules network, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer, 2003), pp. 461–466

    Google Scholar 

  2. U. Raja, M. Tretter, Investigating open source project success: a data mining approach to model formulation, validation and testing, in Proceedings of SUGI, vol. 31 (2006)

    Google Scholar 

  3. F. Chatziasimidis, I. Stamelos, Data collection and analysis of github repositories and users, in 2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA) (IEEE, 2015), pp. 1–6

    Google Scholar 

  4. Y.S. Koh, S.D. Ravana, Unsupervised rare pattern mining: a survey. ACM Trans. Knowl. Discov. Data (TKDD) 10(4), 45 (2016)

    Google Scholar 

  5. A.W.R. Emanuel, R. Wardoyo, J.E. Istiyanto, K. Mustofa, Success factors of OSS projects from source forge using data mining association rule, in International Conference on Distributed Framework and Applications (DFmA) (IEEE, 2010), pp. 1–8

    Google Scholar 

  6. Y. Hu, J. Zhang, X. Bai, S. Yu, Z. Yang, Influence analysis of github repositories. SpringerPlus 5(1), 1268 (2016)

    Google Scholar 

  7. R. Agrawal, R. Srikant, et al., Fast algorithms for mining association rules, in Proceedings of the 20th International Conference on Very Large Data Bases, VLDB. vol. 1215 (1994), pp. 487–499

    Google Scholar 

  8. G. Grahne, J. Zhu, Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans. Knowl. Data Eng. 17(10), 1347–1362 (2005)

    Google Scholar 

  9. S.K. Tanbeer, M.M. Hassan, A Almogren., M. Zuair, B.S Jeong, Scalable regular pattern mining in evolving body sensor data. Future Gener. Comput. Syst. 75, 172–186 (2017)

    Google Scholar 

  10. S. Tsang, Y.S. Koh, G. Dobbie, Rp-tree: rare pattern tree mining, in International Conference on Data Warehousing and Knowledge Discovery (Springer, 2011), pp. 277–288

    Google Scholar 

  11. J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation. ACM Sigmod Rec. 29, 1–12 (2000)

    Google Scholar 

  12. A. Borah, B. Nath, Tree based frequent and rare pattern mining techniques: a comprehensive structural and empirical analysis. SN Appl. Sci. 1(9), 972 (2019)

    Article  Google Scholar 

  13. C.R. Kothari, Research Methodology Methods and Techniques (New Age International Publications, 2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Jyothi Upadhya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Upadhya, K.J., Rao, B.D., Geetha, M. (2022). Discovery of Popular Languages from GitHub Repository: A Data Mining Approach. In: Reddy, V.S., Prasad, V.K., Wang, J., Reddy, K. (eds) Soft Computing and Signal Processing. ICSCSP 2021. Advances in Intelligent Systems and Computing, vol 1413. Springer, Singapore. https://doi.org/10.1007/978-981-16-7088-6_14

Download citation

Publish with us

Policies and ethics