Detecting Outliers in Terms of Errors in Embedded Software Development Projects Using Imbalanced Data Classification

  • Kazunori Iwata
  • Toyoshiro Nakashima
  • Yoshiyuki Anan
  • Naohiro Ishii
Part of the Studies in Computational Intelligence book series (SCI, volume 726)


This study examines the effect of undersampling on the detection of outliers in terms of the number of errors in embedded software development projects. Our study aims at estimating the number of errors and the amount of effort in projects. As outliers can adversely affect this estimation, they are excluded from many estimation models. However, such outliers can be identified in practice once the projects have been completed; therefore, they should not be excluded while constructing models and estimating errors or effort. We have also attempted to detect outliers. However, the accuracy of the classifications was not acceptable because of a small number of outliers. This problem is referred to as data imbalance. To avoid this problem, we explore rebalancing methods using k-means cluster-based undersampling. This method aims at improving the proportion of outliers that are correctly identified while maintaining the other classification performance metrics high. Evaluation experiments were performed, and the results show that the proposed methods can improve the accuracy of detecting outliers; however, they also classify too many samples as outliers.


Embedded software Imbalanced dataset Support vector machine k-means clustering algorithms Undersampling 



This work was supported by JSPS KAKENHI Grant Number JP16K00310 and JP17K00317.


  1. 1.
    Barandela, R., Sánchez, J.S., Garca, V., Rangel, E.: Strategies for learning in class imbalance problems. Pattern Recognit. 36(3), 849–851 (2003).
  2. 2.
    Boehm, B.: Software engineering. IEEE Trans. Softw. Eng. C-25(12), 1226–1241 (1976)Google Scholar
  3. 3.
    Fumera, G., Roli, F.: Cost-sensitive learning in support vector machines. In: VIII Convegno Associazione Italiana per L’Intelligenza Artificiale (2002)Google Scholar
  4. 4.
    Gordon, M., Kochen, M.: Recall-precision trade-off: a derivation. J. Am. Soc. Inf. Sci. 40(3), 145–151 (1989)CrossRefGoogle Scholar
  5. 5.
    Hartigan, J.A., Wong, M.A.: Algorithm as 136: A k-means clustering algorithm. J. R. Stat. Soc. Series C (Appl. Stat.) 28(1), 100–108 (1979).
  6. 6.
    Hirayama, M.: Current state of embedded software (in japanese). J. Inf. Process. Soc. Jpn (IPSJ) 45(7), 677–681 (2004)Google Scholar
  7. 7.
    Iwata, K., Liebman, E., Stone, P., Nakashima, T., Anan, Y., Ishii, N.: Bin-Based Estimation of the Amount of Effort for Embedded Software Development Projects with Support Vector Machines, pp. 157–169. Springer International Publishing (2016)Google Scholar
  8. 8.
    Iwata, K., Nakashima, T., Anan, Y., Ishii, N.: Error estimation models integrating previous models and using artificial neural networks for embedded software development projects. In: Proceedings of 20th IEEE International Conference on Tools with Artificial Intelligence, pp. 371–378 (2008)Google Scholar
  9. 9.
    Iwata, K., Nakashima, T., Anan, Y., Ishii, N.: Improving accuracy of an artificial neural network model to predict effort and errors in embedded software development projects. In: Lee, R., Ma, J., Bacon, L., Du, W., Petridis, M. (eds.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2010, Studies in Computational Intelligence, vol. 295, pp. 11–21. Springer Berlin Heidelberg (2010). doi: 10.1007/978-3-642-13265-0_2
  10. 10.
    Iwata, K., Nakashima, T., Anan, Y., Ishii, N.: Estimating interval of the number of errors for embedded software development projects. Int. J. Softw. Innov. (IJSI) 2(3), 40–50 (2014). doi: 10.4018/ijsi.2014070104 CrossRefzbMATHGoogle Scholar
  11. 11.
    Iwata, K., Nakashima, T., Anan, Y., Ishii, N.: Effort estimation for embedded software development projects by combining machine learning with classification. In: Proceedings of 3rd ACIS International Conference on Computational Science/Intelligence and Applied Information, pp. 265–270 (2016)Google Scholar
  12. 12.
    Komiyama, T.: Development of foundation for effective and efficient software process improvement. J. Inf. Process. Soc. Jpn (IPSJ) 44(4), 341–347 (2003) (in japanese)Google Scholar
  13. 13.
    Masnadi-Shirazi, H., Vasconcelos, N.: Risk minimization, probability elicitation, and cost-sensitive svms. In: J. Fürnkranz, T. Joachims (eds.) ICML, pp. 759–766. Omnipress (2010).
  14. 14.
    Nakamoto, Y., Takada, H., Tamaru, K.: Current state and trend in embedded systems. J. Inf. Process. Soc. Jpn (IPSJ) 38(10), 871–878 (1997) (in japanese)Google Scholar
  15. 15.
    Nakashima, S.: Introduction to model-checking of embedded software. J. Inf. Process. Soc. Jpn (IPSJ) 45(7), 690–693 (2004) (in japanese)Google Scholar
  16. 16.
    Ogasawara, H., Kojima, S.: Process improvement activities that put importance on stay power. J. Inf. Process. Soc. Jpn (IPSJ) 44(4), 334–340 (2003) (in japanese)Google Scholar
  17. 17.
    Sammut, C., Webb, G.I. (eds.): Encyclopedia of Machine Learning and Data Mining. 2. Springer US (2017)Google Scholar
  18. 18.
    Takagi, Y.: A case study of the success factor in large-scale software system development project. J. Inf. Process. Soc. Jpn (IPSJ) 44(4), 348–356 (2003) (in japanese)Google Scholar
  19. 19.
    Tamaru, K.: Trends in software development platform for embedded systems. J. Inf. Process. Soc. Jpn (IPSJ) 45(7), 699–703 (2004) (in japanese)Google Scholar
  20. 20.
    Ubayashi, N.: Modeling techniques for designing embedded software. J. Inf. Process. Soc. Jpn (IPSJ) 45(7), 682–692 (2004) (in japanese)Google Scholar
  21. 21.
    Watanabe, H.: Product line technology for software development. J. Inf. Process. Soc. Jpn (IPSJ) 45(7), 694–698 (2004) (in japanese)Google Scholar
  22. 22.
    Weiss, G.M.: Mining with rarity: a unifying framework. SIGKDD Explor. Newsl. 6(1), 7–19 (2004). doi: 10.1145/1007730.1007734

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Kazunori Iwata
    • 1
  • Toyoshiro Nakashima
    • 2
    • 3
  • Yoshiyuki Anan
    • 4
  • Naohiro Ishii
    • 5
  1. 1.Department of Business AdministrationAichi UniversityNagoya, AichiJapan
  2. 2.Department of Culture-Information StudiesSugiyama Jogakuen UniversityChikusa-ku, Nagoya, AichiJapan
  3. 3.Institute of Managerial ResearchAichi UniversityNakamura-ku, Nagoya, AichiJapan
  4. 4.Base DivisionOmron Software Co., Ltd.Shimogyo-ku, KyotoJapan
  5. 5.Department of Information ScienceAichi Institute of TechnologyToyota, AichiJapan

Personalised recommendations