Empirical Software Engineering

, Volume 23, Issue 2, pp 645–692 | Cite as

Noise in Mylyn interaction traces and its impact on developers and recommendation systems

  • Zéphyrin Soh
  • Foutse Khomh
  • Yann-Gaël Guéhéneuc
  • Giuliano Antoniol


Interaction traces (ITs) are developers’ logs collected while developers maintain or evolve software systems. Researchers use ITs to study developers’ editing styles and recommend relevant program entities when developers perform changes on source code. However, when using ITs, they make assumptions that may not necessarily be true. This article assesses the extent to which researchers’ assumptions are true and examines noise in ITs. It also investigates the impact of noise on previous studies. This article describes a quasi-experiment collecting both Mylyn ITs and video-screen captures while 15 participants performed four realistic software maintenance tasks. It assesses the noise in ITs by comparing Mylyn ITs and the ITs obtained from the video captures. It proposes an approach to correct noise and uses this approach to revisit previous studies. The collected data show that Mylyn ITs can miss, on average, about 6% of the time spent by participants performing tasks and can contain, on average, about 85% of false edit events, which are not real changes to the source code. The approach to correct noise reveals about 45% of misclassification of ITs. It can improve the precision and recall of recommendation systems from the literature by up to 56% and 62%, respectively. Mylyn ITs include noise that biases subsequent studies and, thus, can prevent researchers from assisting developers effectively. They must be cleaned before use in studies and recommendation systems. The results on Mylyn ITs open new perspectives for the investigation of noise in ITs generated by other monitoring tools such as DFlow, FeedBag, and Mimec, and for future studies based on ITs.


Software maintenance Mylyn Interaction traces Noise Editing behaviour Recommendation systems 



The authors greatly thanks the participants of the experiments described in this article. This study would not have been possible without their participation. Many thanks also go to intern students Thomas Drioul and Pierre-Antoine Rappe, who transcribed the video captures. Thanks to Seonah Lee and colleagues for providing us with the data and the implementation of their recommendation system. We also thank Annie Ying and Martin Robillard for making their script publicly available for use. The authors are grateful to the editors and anonymous reviewers for their detailed feedback and useful suggestions to improve the earlier version of this article. This work has been partially funded by a NSERC Discovery grant and the Canada Research Chairs on Multi-language Patterns and on Software Change and Evolution.


  1. Amann S, Proksch S, Nadi S (2016) Feedbag: an interaction tracker for visual studio 24th IEEE international conference on program comprehension, ICPC 2016, pp 1–3Google Scholar
  2. Bantelay F, Zanjani M, Kagdi H (2013) Comparing and combining evolutionary couplings from interactions and commits 2013 20th working conference on reverse engineering (WCRE), pp 311– 320Google Scholar
  3. Beller M, Gousios G, Panichella A, Zaidman A (2015) When, how, and why developers (do not) test in their ides Proceedings of the 2015 10th joint meeting on foundations of software engineering, ESEC/FSE 2015, pp 179–190Google Scholar
  4. Bouckaert RR, Frank E, Hall M, Kirkby R, Reutemann P, Seewald A, Scuse D (2013) WEKA Manual for Version 3-7-8Google Scholar
  5. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357zbMATHGoogle Scholar
  6. DeLine R, Czerwinski M, Robertson G (2005) Easing program comprehension by sharing navigation data 2005 IEEE symposium on visual languages and human-centric computing, pp 241–248Google Scholar
  7. Fritz T, Shepherd DC, Kevic K, Snipes W, Bräunlich C (2014) Developers’ code context models for change tasks Proceedings of the 22nd ACM SIGSOFT international symposium on foundations of software engineering, FSE 2014, pp 7–18Google Scholar
  8. Kamei Y, Monden A, Matsumoto S, Kakimoto T, Matsumoto KI (2007) The effects of over and under sampling on fault-prone module detection First international symposium on empirical software engineering and measurement, 2007. ESEM 2007, pp 196–204Google Scholar
  9. Kersten M, Murphy GC (2005) Mylar: a degree-of-interest model for ides Proceedings of the 4th international conference on aspect-oriented software development, AOSD ’05, pp 159–168Google Scholar
  10. Kersten M, Murphy GC (2006) Using task context to improve programmer productivity Proceedings of the 14th ACM SIGSOFT/FSE, pp 1–11Google Scholar
  11. Ko A, Myers B, Coblenz M, Aung H (2006) An exploratory study of how developers seek, relate, and collect relevant information during software maintenance tasks. IEEE Trans Softw Eng 32(12):971–987CrossRefGoogle Scholar
  12. Kuhn M (2016) caret: classification and regression training.
  13. Kuhn M (2008) Building predictive models in r using the caret package. J Stat Softw 28(5):1–26CrossRefGoogle Scholar
  14. Layman LM (2009) Information needs of developers for program comprehension during software maintenance tasks. Ph.D. thesis, North Carolina State UniversityGoogle Scholar
  15. Layman LM, Williams LA, St. Amant R (2008) Mimec: intelligent user notification of faults in the eclipse ide Proceedings of the 2008 international workshop on cooperative and human aspects of software engineering, CHASE ’08, pp 73–76Google Scholar
  16. Lee S, Kang S (2013) Clustering navigation sequences to create contexts for guiding code navigation. J Syst SoftwGoogle Scholar
  17. Lee S, Kang S, Kim S, Staats M (2015) The impact of view histories on edit recommendations. IEEE Trans Softw Eng 41(3):314–330CrossRefGoogle Scholar
  18. Minelli R, Mocci A, Lanza M, Kobayashi T (2014) Quantifying program comprehension with interaction data 14th international conference on quality software, QSIC 2014Google Scholar
  19. Murphy GC, Kersten M, Findlater L (2006) How are java software developers using the eclipse IDE IEEE Soft 23(4):76–83CrossRefGoogle Scholar
  20. Parnin C, Rugaber S (2011) Resumption strategies for interrupted programming tasks. Softw Qual J 19(1): 5–34CrossRefGoogle Scholar
  21. Robbes R, Lanza M (2010) Improving code completion with program history. Autom Softw Eng 17(2): 181–212CrossRefGoogle Scholar
  22. Robbes R, Röthlisberger D (2013) Using developer interaction data to compare expertise metrics Proceedings MSR, pp 297–300Google Scholar
  23. Robillard M, Walker R, Zimmermann T (2010) Recommendation systems for software engineering. IEEE Soft 27(4):80–86CrossRefGoogle Scholar
  24. Romano J, Kromrey JD, Coraggio J, Skowronek J (2006) Appropriate statistics for ordinal level data: should we really be using t-test and cohen’s d for evaluating group differences on the nsse and other surveys Annual meeting of the Florida Association of Institutional ResearchGoogle Scholar
  25. Rousseeuw P (1987) Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 20(1):53–65CrossRefzbMATHGoogle Scholar
  26. Sanchez H, Robbes R, Gonzalez VM (2015) An empirical study of work fragmentation in software evolution tasks Proceedings SANER, pp 251–260Google Scholar
  27. Singer J, Elves R, Storey MA (2005) Navtracks: supporting naviga-tion in software maintenance International conference on software maintenance, pp 325–334Google Scholar
  28. Soh Z, Drioul T, Rappe PA, Khomh F, Gueheneuc YG, Habra N (2015) Noises in interaction traces data and their impact on previous research studies 9th International symposium on empirical software engineering and measurement. To appearGoogle Scholar
  29. Soh Z, Khomh F, Gueheneuc YG, Antoniol G (2013) Towards understanding how developers spend their effort during maintenance activities 2013 20th working conference on reverse engineering (WCRE), pp 152–161Google Scholar
  30. Soh Z, Khomh F, Gueheneuc YG, Antoniol G, Adams B (2013) On the effect of program exploration on maintenance tasks 2013 20th Working conference on reverse engineering (WCRE), pp 391– 400Google Scholar
  31. Tan PN, Steinbach M, Kumar V (2006) Introduction to data mining, chap. 6: Association analysis: basic concepts and algorithms. PearsonGoogle Scholar
  32. Tantithamthavorn C, McIntosh S, Hassan AE, Matsumoto K (2016) Automated parameter optimization of classification techniques for defect prediction models Proceedings of the 38th international conference on software engineering, ICSE ’16, pp 321–332Google Scholar
  33. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Morgan Kaufmann Publishers Inc, MorganzbMATHGoogle Scholar
  34. Wohlin C, Runeson P, Höst M., Ohlsson MC, Regnell B, Wesslén A (2000) Experimentation in software engineering—an introduction. Kluwer Academic Publishers, KluwerCrossRefzbMATHGoogle Scholar
  35. Ying A, Robillard M (2011) The influence of the task on programmer behaviour Proceedings ICPC, pp 31–40Google Scholar
  36. Zanjani MB, Swartzendruber G, Kagdi H (2014) Impact analysis of change requests on source code based on interaction and commit histories Proceedings of the 11th working conference on mining software repositories, MSR 2014, pp 162–171Google Scholar
  37. Zhang F, Khomh F, Zou Y, Hassan AE (2012) An empirical study of the effect of file editing patterns on software quality Proceedings WCRE, pp 456–465Google Scholar
  38. Zhang F, Zheng Q, Zou Y, Hassan AE (2016) Cross-project defect prediction using a connectivity-based unsupervised classifier Proceedings of the 38th international conference on software engineering, ICSE ’16, pp 309–320Google Scholar
  39. Zimmermann T, Weißgerber P, Diehl S, Zeller A (2005) Mining version histories to guide software changes. IEEE Trans Softw Eng 31(6):429–445CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Ptidej Team, Polytechnique MontréalMontrealCanada
  2. 2.SWAT Lab, Polytechnique MontréalMontrealCanada
  3. 3.Soccer Lab, Polytechnique MontréalMontrealCanada

Personalised recommendations