Abstract
Mining software repositories is a common activity in software engineering with diverse use cases such as understanding project quality, technology usage, and developer profiles. Such mining activities involve, more often than not, a phase for data extraction from the source code in the repository with recurring tasks such as processing the folder structure (possibly on the timeline), classifying repository artifacts (e.g., in terms of the languages or technologies used), and extracting facts from the artifacts by parsing or otherwise. We describe a new approach for such data extraction; its key pillar is a declarative rule-based language for the uniform, inference-based extraction of facts from the repository (the file system), the artifacts in the repository (their content), and previously extracted facts. All inferred facts are maintained in a triple store. We describe a case study for the purpose of understanding the usage of EMF. To this end, we describe an emerging catalog of patterns of using EMF in repositories and we detect these patterns on GitHub. In our implementation, we use Apache Jena for which we provide dedicated language support tailored towards mining software repositories.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Atzeni, M., Atzori, M.: CodeOntology: RDF-ization of source code. In: d’Amato, C., Fernandez, M., Tamma, V., Lecue, F., Cudré-Mauroux, P., Sequeda, J., Lange, C., Heflin, J. (eds.) ISWC 2017. LNCS, vol. 10588, pp. 20–28. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68204-4_2
Berger, B.J., Sohr, K., Koschke, R.: Extracting and analyzing the implemented security architecture of business applications. In: Proceedings of the CSMR, pp. 285–294. IEEE (2013)
Bettini, L.: Implementing Domain-Specific Languages with Xtext and Xtend. Packt Publishing, Birmingham (2013)
Bézivin, J., Jouault, F., Rosenthal, P., Valduriez, P.: Modeling in the large and modeling in the small. In: Aßmann, U., Aksit, M., Rensink, A. (eds.) MDAFA 2003-2004. LNCS, vol. 3599, pp. 33–46. Springer, Heidelberg (2005). https://doi.org/10.1007/11538097_3
Bézivin, J., Jouault, F., Valduriez, P.: On the need for megamodels. In: Proceedings of the OOPSLA/GPCE: Best Practices for Model-Driven Software Development workshop (2004)
Chen, T., Shang, W., Yang, J., Hassan, A.E., Godfrey, M.W., Nasser, M.N., Flora, P.: An empirical study on the practice of maintaining object-relational mapping code in Java systems. In: Proceedings of the MSR 2016, pp. 165–176 (2016)
Cleland-Huang, J., Gotel, O., Zisman, A. (eds.): Software and Systems Traceability. Springer, Heidelberg (2012). https://doi.org/10.1007/978-1-4471-2239-5
ADecan, A., Mens, T., Claes, M., Grosjean, P.: When GitHub meets CRAN: an analysis of inter-repository package dependency problems. In: SANER, pp. 493–504 (2016)
Di Rocco, J., Di Ruscio, D., Härtel, J., Iovino, L., Lämmel, R., Pierantonio, A.: Systematic recovery of MDE technology usage. Springer, LNCS (2018)
Dittrich, K.R., Gatziu, S., Geppert, A.: The active database management system manifesto: a rulebase of ADBMS features. In: Sellis, T. (ed.) RIDS 1995. LNCS, vol. 985, pp. 1–17. Springer, Heidelberg (1995). https://doi.org/10.1007/3-540-60365-4_116
Dyer, R., Nguyen, H.A., Rajan, H., Nguyen, T.N.: Boa: a language and infrastructure for analyzing ultra-large-scale software repositories. In: ICSE, pp. 422–431. IEEE Computer Society (2013)
Dyer, R., Nguyen, H.A., Rajan, H., Nguyen, T.N.: Boa: ultra-large-scale software repository and source-code mining. ACM Trans. Softw. Eng. Methodol. 25(1), 7:1–7:34 (2015)
Dyer, R., Rajan, H., Nguyen, H.A., Nguyen, T.N.: Mining billions of AST nodes to study actual and potential usage of java language features. In: ICSE, pp. 779–790. ACM (2014)
Favre, J.-M., Lämmel, R., Varanovich, A.: Modeling the linguistic architecture of software products. In: France, R.B., Kazmeier, J., Breu, R., Atkinson, C. (eds.) MODELS 2012. LNCS, vol. 7590, pp. 151–167. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33666-9_11
Galvão, I., Goknil, A.: Survey of traceability approaches in model-driven engineering. In: Proceedings of the EDOC, pp. 313–326. IEEE (2007)
Han, M., Hofmeister, C., Nord, R.L.: Reconstructing software architecture for J2EE web applications. In: Proceedings of the WCRE, pp. 67–79. IEEE (2003)
Härtel, J., Härtel, L., Lämmel, R., Varanovich, A., Heinz, M.: Interconnected linguistic architecture. Program. J. 1(1), 3 (2017)
Hassan, A.E., Jiang, Z.M., Holt, R.C.: Source versus object code extraction for recovering software architecture. In: Proceedings of the WCRE, pp. 67–76. IEEE (2005)
Heinz, M., Lämmel, R., Varanovich, A.: Axioms of linguistic architecture. In: Proceedings of the MODELSWARD 2017 (2017)
Janes, A., Piatov, D., Sillitti, A., Succi, G.: How to Calculate software metrics for multiple languages using open source parsers. In: Petrinja, E., Succi, G., El Ioini, N., Sillitti, A. (eds.) OSS 2013. IAICT, vol. 404, pp. 264–270. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38928-3_20
Karus, S., Gall, H.C.: A study of language usage evolution in open source software. In: MSR, pp. 13–22. ACM (2011)
Keenan, E., Czauderna, A., Leach, G., Cleland-Huang, J., Shin, Y., Moritz, E., Gethers, M., Poshyvanyk, D., Maletic, J.I., Hayes, J.H., Dekhtyar, A., Manukian, D., Hossein, S., Hearn, D.: TraceLab: an experimental workbench for equipping researchers to innovate, synthesize, and comparatively evaluate traceability solutions. In: Proc. ICSE, pp. 1375–1378. IEEE (2012)
Kikas, R., Gousios, G., Dumas, M., Pfahl, D.: Structure and evolution of package dependency networks. In: MSR, pp. 102–112 (2017)
Kniesel, G., Binun, A.: Standing on the shoulders of giants - a data fusion approach to design pattern detection. In: Proceedings of the ICPC, pp. 208–217. IEEE (2009)
Kniesel, G., Binun, A., Hegedüs, P., Fülöp, L.J., Chatzigeorgiou, A., Guéhéneuc, Y., Tsantalis, N.: DPDX-towards a common result exchange format for design pattern detection tools. In: Proceedings of the CSMR, pp. 232–235. IEEE (2010)
Kolovos, D.S., Matragkas, N.D., Korkontzelos, I., Ananiadou, S., Paige, R.F.: Assessing the use of Eclipse MDE technologies in open-source software projects. In: Proceedings of the MODELS, pp. 20–29 (2015)
Koschke, R.: Architecture reconstruction. In: De Lucia, A., Ferrucci, F. (eds.) ISSSE 2006-2008. LNCS, vol. 5413, pp. 140–173. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-95888-8_6
Kusel, A., Schoenboeck, J., Wimmer, M., Retschitzegger, W., Schwinger, W., Kappel, G.: Reality check for model transformation reuse: the ATL transformation zoo case study. In: Proceedings of the AMT 2013, volume 1077 of CEUR Workshop Proceedings. CEUR-WS.org (2013)
Lämmel, R., Varanovich, A.: Interpretation of linguistic architecture. In: Cabot, J., Rubin, J. (eds.) ECMFA 2014. LNCS, vol. 8569, pp. 67–82. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09195-2_5
Lutellier, T., Chollak, D., Garcia, J., Tan, L., Rayside, D., Medvidovic, N., Kroeger, R.: Comparing software architecture recovery techniques using accurate dependencies. In: Proceedings of the ICSE, pp. 69–78 (2015)
Mäder, P., Egyed, A.: Do developers benefit from requirements traceability when evolving and maintaining a software system? Empir. Softw. Eng. 20(2), 413–441 (2015)
Mayer, P., Bauer, A.: An empirical analysis of the utilization of multiple programming languages in open source projects. In: Proceedings of the EASE, pp. 4:1–4:10 (2015)
Robles, G., Ho-Quang, T., Hebig, R., Chaudron, M.R.V., Fernández, M.A.: An extensive dataset of UML models in GitHub. In: Proc. MSR, pp. 519–522 (2017)
Roover, C.D.: A logic meta-programming foundation for example-driven pattern detection in object-oriented programs. In: Proceedings of the ICSM, pp. 556–561. IEEE (2011)
Roover, C.D., Lämmel, R., Pek, E.: Multi-dimensional exploration of API usage. In: Proceedings of the ICPC 2013, pp. 152–161. IEEE (2013)
Saied, M.A., Sahraoui, H.A.: A cooperative approach for combining client-based and library-based API usage pattern mining. In: Proceedings of the ICPC, pp. 1–10 (2016)
Sawant, A.A., Bacchelli, A.: A dataset for API usage. In: Proceedings of the MSR, pp. 506–509 (2015)
Seibel, A., Hebig, R., Giese, H.: Traceability in model-driven engineering: efficient and scalable traceability maintenance. In: Cleland-Huang, J., Gotel, O., Zisman, A. (eds.) Software and Systems Traceability, pp. 215–240. Springer, London (2012). https://doi.org/10.1007/978-1-4471-2239-5_10
Shatnawi, A., Mili, H., El-Boussaidi, G., Boubaker, A., Guéhéneuc, Y., Moha, N., Privat, J., Abdellatif, M.: Analyzing program dependencies in java EE applications. In: Proceedings of the MSR (2017)
Stevens, R., Roover, C.D., Noguera, C., Kellens, A., Jonckers, V.: A logic foundation for a general-purpose history querying tool. Sci. Comput. Program. 96, 107–120 (2014)
Zisman, A.: Using rules for traceability creation. In: Cleland-Huang, J., Gotel, O., Zisman, A. (eds.) Software and Systems Traceability, pp. 147–170. Springer, London (2012). https://doi.org/10.1007/978-1-4471-2239-5_7
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Härtel, J., Heinz, M., Lämmel, R. (2018). EMF Patterns of Usage on GitHub. In: Pierantonio, A., Trujillo, S. (eds) Modelling Foundations and Applications. ECMFA 2018. Lecture Notes in Computer Science(), vol 10890. Springer, Cham. https://doi.org/10.1007/978-3-319-92997-2_14
Download citation
DOI: https://doi.org/10.1007/978-3-319-92997-2_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-92996-5
Online ISBN: 978-3-319-92997-2
eBook Packages: Computer ScienceComputer Science (R0)