Empirical Software Engineering

, Volume 12, Issue 3, pp 275–293 | Cite as

Quality of manual data collection in Java software: an empirical investigation



Data collection, both automatic and manual, lies at the heart of all empirical studies. The quality of data collected from software informs decisions on maintenance, testing and wider issues such as the need for system re-engineering. While of the two types stated, automatic data collection is preferable, there are numerous occasions when manual data collection is unavoidable. Yet, very little evidence exists to assess the error-proneness of the latter. Herein, we investigate the extent to which manual data collection for Java software compared with its automatic counterpart for the same data. We investigate three hypotheses relating to the difference between automated and manual data collection. Five Java systems were used to support our investigation. Results showed that, as expected, manual data collection was error-prone, but nowhere near the extent we had initially envisaged. Key indicators of mistakes in manual data collection were found to be poor developer coding style, poor adherence to sound OO coding principles, and the existence of relatively large classes in some systems. Some interesting results were found relating to the collection of public class features and the types of error made during manual data collection. The study thus offers an insight into some of the typical problems associated with collecting data manually; more significantly, it highlights the problems that poorly written systems have on the quality of visually extracted data.


Data collection Java Software metrics Empirical investigation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Basili VR, Briand LC, Melo WL (1996) A validation of object-oriented design metrics as quality indicators. IEEE Trans Softw Eng 22(10):751–761CrossRefGoogle Scholar
  2. Bieman J, Straw G, Wang H, Munger P, Alexander R (2003) Design patterns and change proneness: an examination of five evolving systems. In: Proceedings IEEE international symposium on software metrics (METRICS ’03), Sydney, Australia, pp 40–49Google Scholar
  3. Briand L, Bunse L, Daly J, Differding C (1997) An experimental comparison of the maintainability of object-oriented and structured design documents. Empir Softw Eng J 2(3):291–312CrossRefGoogle Scholar
  4. Counsell S, Loizou G, Najjar R (2006) Ignore size and inner classes, poor class layout and feature order are the real enemies of OSS developers. Birkbeck Technical Report, BBKCS-06-08Google Scholar
  5. Counsell S, Loizou G, Najjar R, Mannock K (2002) On the relationship between encapsulation, inheritance and friends in C++ software. In: Proceedings of international conference on software and systems engineering and their applications, ICSSEA’02, Paris, FranceGoogle Scholar
  6. Counsell S, Newson P, Mendes E (2000) Architectural level hypothesis testing through reverse engineering of object-oriented software. In: Proceedings of the 8th international workshop on program comprehension (IWPC’2000), Limerick, Ireland, pp 60–66Google Scholar
  7. El Emam K, Benlarbi S, Goel N, Rai N (2001) The confounding effect of class size on the validity of object-oriented metrics. IEEE Trans Softw Eng 27(7):630–650CrossRefGoogle Scholar
  8. Fenton NE, Pfleeger SL (1996) Software Metrics: A Rigorous and Practical Approach. International Thomson Computer Press, London, UKGoogle Scholar
  9. Fowler M, Beck K, Brant J, Opdyke W, Roberts D (1999) Refactoring: Improving the Design of Existing Code. Addison Wesley, Massachusetts, USAGoogle Scholar
  10. Gamma E, Helm R, Johnson R, Vlissides J (1995) Design Patterns: Elements of Reusable Object-Oriented Software. Addison Wesley, Massachusetts, USAGoogle Scholar
  11. Harrison R, Counsell SJ, Nithi R (1998a) Coupling metrics for OO design. In: IEEE international symposium on software metrics, Bethesda, Maryland, USA, pp 94–98Google Scholar
  12. Harrison R, Counsell SJ, Nithi R (1998b) An investigation into the applicability and validity of object-oriented design metrics. Empir Softw Eng J 3:255–273CrossRefGoogle Scholar
  13. Kitchenham B, Pfleeger S, McColl B, Eagan S (2002a) An empirical study of maintenance and development accuracy. J Syst Softw 64:57–77CrossRefGoogle Scholar
  14. Kitchenham B, Pfleeger S, Pickard L, Jones P, Hoaglin D, El Emam K, Rosenberg J (2002b) Preliminary guidelines for empirical research in software engineering. IEEE Trans Softw Eng 28(8):721–734CrossRefGoogle Scholar
  15. Kitchenham BA, Hughes RT, Linkman S (2001) Modeling software measurement data. IEEE Trans Softw Eng 27(9):788–804CrossRefGoogle Scholar
  16. Kitchenham BA, Pfleeger SL (1996) Software quality: the elusive target. IEEE Softw 13(1):12–21CrossRefGoogle Scholar
  17. Najjar R, Counsell S, Loizou G, Mannock K (2003) The role of constructors in the context of refactoring object-oriented systems. In: Proceedings of the 7th European conference on software maintenance and reengineering, Benevento, Italy, pp 111–120Google Scholar
  18. Schneidewind NF (1992) Methodology for validating software metrics. IEEE Trans Softw Eng 18(5):410–422CrossRefGoogle Scholar
  19. Siegel S, Castellan NJ (1988) Nonparametric Statistics for the Behavioural Sciences. McGraw-Hill, New YorkGoogle Scholar
  20. Weinand A, Gamma E, Marty R (1998) ET++-an object-oriented application framework in C++. In: Proceedings of object-oriented programming systems, languages and applications (OOP-SLA), San Diego, USA, pp 46–57Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2006

Authors and Affiliations

  1. 1.School of Computing, Information Systems and MathematicsBrunel UniversityUxbridge, MiddlesexUK
  2. 2.School of Computer Science and Information Systems, BirkbeckUniversity of LondonLondonUK

Personalised recommendations