Discovery of Popular Languages from GitHub Repository: A Data Mining Approach

Upadhya, K. Jyothi; Rao, B. Dinesh; Geetha, M.

doi:10.1007/978-981-16-7088-6_14

K. Jyothi Upadhya¹⁸,
B. Dinesh Rao¹⁹ &
M. Geetha¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1413))

Included in the following conference series:

International Conference on Soft Computing and Signal Processing

711 Accesses

Abstract

Usage of Open Source Software (OSS) has been increased over the past fifteen years among programmers and computer users. OSS communities work as a “Bazaar” where the project constructors and end-users meet together and search for suitable matches to their skills and requirements. OSS is emerging as a strong competitor to commercial or closed software. GitHub is an OSS forge started in 2008 in order to simplify code sharing. It is a Web site and cloud-based service that aids software developers to store, manage, track, and control changes to their code. When a GitHub project fails, it results in the loss of time, effort, and resources of this large community. The current need is to build models that find interesting factors that contributes to the success of these projects. The massive repositories make this domain a good candidate for exploratory research using the data mining approach. In this work, the FP-Growth method is used to find the popular two programming language combinations and is validated using the SPSS tool. The outcome of this work benefits the OSS community in terms of time and resources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

S. Chawla, B. Arunasalam, J. Davis, Mining open source software (OSS) data using association rules network, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer, 2003), pp. 461–466
Google Scholar
U. Raja, M. Tretter, Investigating open source project success: a data mining approach to model formulation, validation and testing, in Proceedings of SUGI, vol. 31 (2006)
Google Scholar
F. Chatziasimidis, I. Stamelos, Data collection and analysis of github repositories and users, in 2015 6th International Conference on Information, Intelligence, Systems and Applications (IISA) (IEEE, 2015), pp. 1–6
Google Scholar
Y.S. Koh, S.D. Ravana, Unsupervised rare pattern mining: a survey. ACM Trans. Knowl. Discov. Data (TKDD) 10(4), 45 (2016)
Google Scholar
A.W.R. Emanuel, R. Wardoyo, J.E. Istiyanto, K. Mustofa, Success factors of OSS projects from source forge using data mining association rule, in International Conference on Distributed Framework and Applications (DFmA) (IEEE, 2010), pp. 1–8
Google Scholar
Y. Hu, J. Zhang, X. Bai, S. Yu, Z. Yang, Influence analysis of github repositories. SpringerPlus 5(1), 1268 (2016)
Google Scholar
R. Agrawal, R. Srikant, et al., Fast algorithms for mining association rules, in Proceedings of the 20th International Conference on Very Large Data Bases, VLDB. vol. 1215 (1994), pp. 487–499
Google Scholar
G. Grahne, J. Zhu, Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans. Knowl. Data Eng. 17(10), 1347–1362 (2005)
Google Scholar
S.K. Tanbeer, M.M. Hassan, A Almogren., M. Zuair, B.S Jeong, Scalable regular pattern mining in evolving body sensor data. Future Gener. Comput. Syst. 75, 172–186 (2017)
Google Scholar
S. Tsang, Y.S. Koh, G. Dobbie, Rp-tree: rare pattern tree mining, in International Conference on Data Warehousing and Knowledge Discovery (Springer, 2011), pp. 277–288
Google Scholar
J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation. ACM Sigmod Rec. 29, 1–12 (2000)
Google Scholar
A. Borah, B. Nath, Tree based frequent and rare pattern mining techniques: a comprehensive structural and empirical analysis. SN Appl. Sci. 1(9), 972 (2019)
Article Google Scholar
C.R. Kothari, Research Methodology Methods and Techniques (New Age International Publications, 2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
K. Jyothi Upadhya & M. Geetha
Manipal School of Information Sciences, Manipal Academy of Higher Education, Manipal, Karnataka, 576104, India
B. Dinesh Rao

Authors

K. Jyothi Upadhya
View author publications
You can also search for this author in PubMed Google Scholar
B. Dinesh Rao
View author publications
You can also search for this author in PubMed Google Scholar
M. Geetha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K. Jyothi Upadhya .

Editor information

Editors and Affiliations

Department of Electronics and Communication Engineering, Malla Reddy College of Engineering and Technology, Hyderabad, Telangana, India
V. Sivakumar Reddy
Department of Computer Science and Engineering, Jawaharlal Nehru Technological University Hyderabad, Hyderabad, Telangana, India
V. Kamakshi Prasad
Department of Computer Science and Software Engineering, Monmouth University, New Jersey, NJ, USA
Jiacun Wang
Department of Electronics and Communication Engineering, Sir Visvesvaraya Institute of Technology, Nashik, Maharashtra, India
K.T.V. Reddy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Upadhya, K.J., Rao, B.D., Geetha, M. (2022). Discovery of Popular Languages from GitHub Repository: A Data Mining Approach. In: Reddy, V.S., Prasad, V.K., Wang, J., Reddy, K. (eds) Soft Computing and Signal Processing. ICSCSP 2021. Advances in Intelligent Systems and Computing, vol 1413. Springer, Singapore. https://doi.org/10.1007/978-981-16-7088-6_14

Download citation

DOI: https://doi.org/10.1007/978-981-16-7088-6_14
Published: 15 February 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-7087-9
Online ISBN: 978-981-16-7088-6
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics