Skip to main content

Advertisement

Log in

Cost Adjustment for Software Crowdsourcing Tasks Using Ensemble Effort Estimation and Topic Modeling

  • Research Article-Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

Crowdsourced software development (CSSD) is a fast-growing field among software practitioners and researchers from the last two decades. Despite being a favorable environment, no intelligent mechanism exists to assign price to CSSD tasks. Software development effort estimation (SDEE) on the other hand is already an established field in traditional software engineering. SDEE is largely facilitated by machine learning (ML), particularly, ML-based ensemble effort estimation (EEE) which targets accurate estimate by avoiding biases of single ML model. This accuracy of EEE can be exploited for CSSD platforms to establish intelligent cost assignment mechanism. This study aims to integrate EEE with CSSD platform to provide justified costing solution for crowdsourced tasks. Effort-based cost estimation model is proposed, implementing EEE to predict task’s effort along with natural language processing (NLP) analysis on task’s textual description to assign effort-based cost. TopCoder is selected as targeted CSSD platform, and the proposed scheme is implemented on TopCoder QA category comprising software testing tasks. Ensemble prediction is incorporated via random forest, support vector machine and neural network as base learners. LDA topic modeling is utilized for NLP analysis on the textual aspects of CSSD task, with a specific emphasis on the testing and technology factors. Effort estimation results confirm that EEE models, particularly stacking and weighted ensemble, surpass their base learners with 50% overall increased accuracy. Moreover, R2, log-likelihood and topic quality measures confirm considerable LDA model significance. Findings confirmed that cost adjustment achieved from EEE and NLP defines acceptable price range, covering major testing aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

Notes

  1. https://www.octoparse.com/

  2. https://drive.google.com/drive/folders/1Vq0wVygZ7ugzdeAHuVhXi4rjJQTs4kFG?usp=sharing

  3. https://slcladal.github.io/resources/stopwords_en.txt

References

  1. Howe, J.: The rise of crowdsourcing. Wired Mag. 14(6), 1–4 (2006)

    Google Scholar 

  2. Stol, K.-J.; Fitzgerald, B.: Two's company, three's a crowd: a case study of crowdsourcing software development. In: Proceedings of the 36th International Conference on Software Engineering, pp. 187–198 (2014)

  3. Sarı, A.; Tosun, A.; Alptekin, G.I.: A systematic literature review on crowdsourcing in software engineering. J. Syst. Softw. 153, 200–219 (2019)

    Google Scholar 

  4. Mao, K.; Yang, Y.; Li, M.; Harman, M.: Pricing crowdsourcing-based software development tasks. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 1205–1208. IEEE (2013)

  5. Gonen, R.; Raban, D.; Brady, C.; Mazor, R.: Increased efficiency through pricing in online labor markets. J. Electron. Commer. Res. 15(1), 58 (2014)

    Google Scholar 

  6. Singer, Y.: Budget feasible mechanisms. In: 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 765–774. IEEE (2010)

  7. Singer, Y.; Manas, M.: Pricing tasks in online labor markets. In: Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)

  8. Wang, L.; Wang, Y.: Do extra dollars paid-off? An exploratory study on topcoder. In: Proceedings of the 5th International Workshop on Crowd Sourcing in Software Engineering, pp. 21–27 (2018)

  9. Saremi, L.; Saremi, M.R.; Martinez-Mejorado, D.: How much should I pay? An empirical analysis on monetary prize in topcoder. In: International Conference on Human-Computer Interaction, pp. 202–208. Springer (2020)

  10. Li, B.; Wu, W.; Hu, Z.: Evaluation of software quality for competition-based software crowdsourcing projects. In: Proceedings of the 2018 7th International Conference on Software and Computer Applications, pp. 102–109 (2018)

  11. Alelyani, T.; Mao, K.; Yang, Y.: Context-centric pricing: early pricing models for software crowdsourcing tasks. In: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 63–72 (2017)

  12. Asghar, M.Z., et al.: Senti-eSystem: a sentiment-based eSystem-using hybridized fuzzy and deep neural network for measuring customer satisfaction. Softw. Pract. Exp. 51(3), 571–594 (2021)

    Google Scholar 

  13. Fu, Y.; Shen, B.; Chen, Y.; Huang, L.J.A.S.C.: TDMatcher: a topic-based approach to task-developer matching with predictive intelligence for recommendation. Appl. Soft Comput. 110, 107720 (2021)

    Google Scholar 

  14. Wen, J.; Li, S.; Lin, Z.; Hu, Y.; Huang, C.: Systematic literature review of machine learning based software development effort estimation models. Inf. Softw. Technol. 54(1), 41–59 (2012)

    Google Scholar 

  15. Brooks, F.P.: The Mythical Man-Month: Essays on Software Engineering. Pearson Education, New York (1995)

    Google Scholar 

  16. Sheoraj, Y.; Sungkur, R.K.: Using AI to develop a framework to prevent employees from missing project deadlines in software projects-case study of a global human capital management (HCM) software company. Adv. Eng. Softw. 170, 103143 (2022)

    Google Scholar 

  17. Minku, L.L.; Yao, X.: Software effort estimation as a multiobjective learning problem. ACM Trans. Softw. Eng. Methodol. 22(4), 1–32 (2013)

    Google Scholar 

  18. Debarcelos, I.F.; Silva, J.D.S.; Santanna, N.: An investigation of artificial neural networks based prediction systems in software project management. J. Syst. Softw. 81(3), 356–367 (2008)

    Google Scholar 

  19. Hughes, R.T.: Expert judgement as an estimating method. Inf. Softw. Technol. 38(2), 67–75 (1996)

    Google Scholar 

  20. Pospieszny, P.; Beata, C.-C.; Andrzej, K.: An effective approach for software project effort and duration estimation with machine learning algorithms. J. Syst. Softw. 137, 184–196 (2018)

    Google Scholar 

  21. Silhavy, R.; Silhavy, P.; Prokopova, Z.J.J.O.S.: Analysis and selection of a regression model for the use case points method using a stepwise approach. J. Syst. Softw. 125, 1–14 (2017)

    Google Scholar 

  22. Kocaguneli, E.; Menzies, T.; Keung, J.W.: On the value of ensemble effort estimation. J. Mag. 38(6), 1403–1416 (2011)

    Google Scholar 

  23. Mahmood, Y.; Kama, N.; Azmi, A.; Khan, A.S.; Ali, M.J.S.P.: Software effort estimation accuracy prediction of machine learning techniques: a systematic performance evaluation. Softw. Pract. Exp. 52(1), 39–65 (2022)

    Google Scholar 

  24. Idri, A.; Hosni, M.; Abran, A.: Systematic literature review of ensemble effort estimation. J. Syst. Softw. 118, 151–175 (2016)

    Google Scholar 

  25. Zulfiqar, M.; Malik, M.N.; Khan, H.H.J.I.A.: Microtasking activities in crowdsourced software development: a systematic literature review. IEEE Access 10, 24721–24737 (2022)

    Google Scholar 

  26. Vianna, F.R.P.M.; Graeml, A.R.; Peinado, J.J.B.-B.A.R.: An aggregate taxonomy for crowdsourcing platforms, their characteristics, and intents. BAR Braz. Adm. Rev. 19, 1 (2022)

    Google Scholar 

  27. Phannachitta, P.; Matsumoto, K.: Model-based software effort estimation—a robust comparison of 14 algorithms widely used in the data science community. Int. J. Innov. Comput. Inf. Control ICIC Int. 15, 569–589 (2019)

    Google Scholar 

  28. Amasaki, S.: A comparative study on linear combination rules for ensemble effort estimation. In: 2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp 104–107. IEEE (2017)

  29. Idri, A.; Hosni, M.; Abran, A.: Improved estimation of software development effort using classical and fuzzy analogy ensembles. Appl. Soft Comput. 49, 990–1019 (2016)

    Google Scholar 

  30. Song, L.; Minku, L.L.; Yao, X.: Software effort interval prediction via Bayesian inference and synthetic bootstrap resampling. ACM Trans. Softw. Eng. Methodol. 28(1), 1–46 (2019)

    Google Scholar 

  31. Shukla, S.; Kumar, S.; Bal, P.: Analyzing effect of ensemble models on multi-layer perceptron network for software effort estimation. In: Presented at the IEEE World Congress on Services (SERVICES) (2019)

  32. Shah, M.A.; Jawawi, D.N.A.; Isa, M.A.; Younas, M.; Abdelmaboud, A.; Sholichin, F.: Ensembling artificial bee colony with analogy-based estimation to improve software development effort prediction. IEEE Access 8, 58402–58415 (2020)

    Google Scholar 

  33. Hussain, A.; Raja, M.; Vellaisamy, P.; Krishnan, S.; Rajendran, L.: Enhanced framework for ensemble effort estimation by using recursive-based classification. IET Softw. 15(3), 230–238 (2021)

    Google Scholar 

  34. Hosni, M.; Idri, A.; Abran, A.: Investigating heterogeneous ensembles with filter feature selection for software effort estimation. In: Proceedings of the 27th International Workshop on Software Measurement and 12th International Conference on Software Process and Product Measurement, pp. 207–220 (2017)

  35. Mohamed, H.; Ali, I.; Bou, N.A.; Alain, A.: Heterogeneous ensembles for software development effort estimation. In: 2016 3rd International Conference on Soft Computing and Machine Intelligence (ISCMI), pp. 174–178. IEEE (2016)

  36. Mohamed, H.; Ali, I.; Alain, A.; Bou, N.A.: On the value of parameter tuning in heterogeneous ensembles effort estimation. Soft. Comput. 22(18), 5977–6010 (2018)

    Google Scholar 

  37. Palaniswamy, S.K.; Venkatesan, R.: Hyperparameters tuning of ensemble model for software effort estimation. J. Amb. Intell. Human. Comput. 12(6), 6579–6589 (2021)

    Google Scholar 

  38. Rhmann, W.; Pandey, B.; Ansari, G.A.: Software effort estimation using ensemble of hybrid search-based algorithms based on metaheuristic algorithms. Innov. Syst. Softw. Eng. 18, 1–11 (2021)

    Google Scholar 

  39. Azzeh, M.; Nassif, A.B.: Analogy-based effort estimation: a new method to discover set of analogies from dataset characteristics. IET Softw. 9(2), 39–50 (2015)

    Google Scholar 

  40. Shukla, S.; Kumar, S.; Bal, P.R.: Analyzing effect of ensemble models on multi-layer perceptron network for software effort estimation. In: 2019 IEEE World Congress on Services (SERVICES), 2642, pp. 386–387. IEEE (2019)

  41. Elish, M.O.; Helmy, T.; Hussain, M.I.: Empirical study of homogeneous and heterogeneous ensemble models for software development effort estimation. Math. Prob. Eng. 2013, 1–21 (2013)

    Google Scholar 

  42. Kocaguneli, E.; Menzies, T.: Software effort models should be assessed via leave-one-out validation. J. Syst. Softw. 86(7), 1879–1890 (2013)

    Google Scholar 

  43. Cabral, J.T.H.D.A.; Oliveira, A.L.: Ensemble effort estimation using dynamic selection. J. Syst. Softw. 175, 110904 (2021)

    Google Scholar 

  44. Rhmann, W.; Pandey, B.; Ansari, G.A.: Software effort estimation using ensemble of hybrid search-based algorithms based on metaheuristic algorithms. Innov. Syst. Softw. Eng. 18(2), 309–319 (2022)

    Google Scholar 

  45. Yang, Y.; Karim, M.R.; Saremi, R.; Ruhe, G.: Who should take this task? Dynamic decision support for crowd workers. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 1–10 (2016)

  46. Illahi, I.; Liu, H.; Umer, Q.; Niu, N.: Machine learning based success prediction for crowdsourcing software projects. J. Syst. Softw. 178, 110965 (2021)

    Google Scholar 

  47. Blei, D.M.; Ng, A.Y.; Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(234), 993–1022 (2003)

    Google Scholar 

  48. Jeong, J.; Kim, N.: Does sentiment help requirement engineering: Exploring sentiments in user comments to discover informative comments. Autom. Softw. Eng. 28(2), 1–26 (2021)

    Google Scholar 

  49. Clark, B.; Devnani-Chulani, S.; Boehm, B.: Calibrating the COCOMO II post-architecture model. In: Proceedings of the 20th International Conference on Software Engineering, pp. 477–480. IEEE (1998)

  50. Jiang, Z.; Naudé, P.; Jiang, B.: The effects of software size on development effort and software quality. Int. J. Comput. Inf. Sci. Eng. 1(4), 230–234 (2007)

    Google Scholar 

  51. Qi, F.; Jing, X.-Y.; Zhu, X.; Xie, X.; Xu, B.; Ying, S.: Software effort estimation based on open source projects: case study of Github. Inf. Softw. Technol. 92, 145–157 (2017)

    Google Scholar 

  52. Sheta, A.F.; Aljahdali, S.: Software effort estimation inspired by COCOMO and FP models: A fuzzy logic approach. Int. J. Adv. Comput. Sci. Appl. 4, 11 (2013)

    Google Scholar 

  53. dos Santos, E.W.; Nunes, I.: Investigating the effectiveness of peer code review in distributed software development based on objective and subjective data. J. Softw. Eng. Res. Dev. 6(1), 1–31 (2018)

    Google Scholar 

  54. Sakhrawi, Z.; Sellami, A.; Bouassida, N.: Software enhancement effort estimation using correlation-based feature selection and stacking ensemble method. Clust. Comput. 25(4), 2779–2792 (2022)

    Google Scholar 

  55. Khatun, N.: Applications of normality test in statistical analysis. Open J. Stat. 11(01), 113 (2021)

    Google Scholar 

  56. Atkinson, A.C.; Riani, M.; Corbellini, A.: The box–cox transformation: Review and extensions. Stat. Sci. 36(2), 239–255 (2021)

    MathSciNet  Google Scholar 

  57. Fouedjio, F.: Classification random forest with exact conditioning for spatial prediction of categorical variables. Artif. Intell. Geosci. 2, 82–95 (2021)

    Google Scholar 

  58. Au, T.C.: Random forests, decision trees, and categorical predictors: the" absent levels" problem. J. Mach. Learn. Res. 19(1), 1737–1766 (2018)

    MathSciNet  Google Scholar 

  59. Piccialli, V.; Sciandrone, M.: Nonlinear optimization and support vector machines. Ann. Oper. Res. 314, 1–33 (2022)

    MathSciNet  Google Scholar 

  60. Landi, A.; Piaggi, P.; Laurino, M.; Menicucci, D.: Artificial neural networks for nonlinear regression and classification. In: 2010 10th International Conference on Intelligent Systems Design and Applications, pp. 115–120. IEEE (2010)

  61. Chen, D.; Hu, F.; Nian, G.; Yang, T.: Deep residual learning for nonlinear regression. Entropy 22(2), 193 (2020)

    MathSciNet  Google Scholar 

  62. Villalobos-Arias, L.; Quesada-López, C.; Guevara-Coto, J.; Martínez, A.; Jenkins, M.: Evaluating hyper-parameter tuning using random search in support vector machines for software effort estimation. In: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 31–40 (2020)

  63. Song, L.; Langfelder, P.; Horvath, S.: Random generalized linear model: a highly accurate and interpretable ensemble predictor. BMC Bioinf. 14(1), 1–22 (2013)

    Google Scholar 

  64. Mustapha, H.; Abdelwahed, N.: Investigating the use of random forest in software effort estimation. Procedia Comput. Sci. 148, 343–352 (2019)

    Google Scholar 

  65. Alsghaier, H.; Akour, M.: Software fault prediction using particle swarm algorithm with genetic algorithm and support vector machine classifier. Softw. Pract. Exp. 50(4), 407–427 (2020)

    Google Scholar 

  66. Kumar, P.S.; Behera, H.S.; Kumari, A.; Nayak, J.; Naik, B.: Advancement from neural networks to deep learning in software effort estimation: perspective of two decades. Comput. Sci. Rev. 38, 100288 (2020)

    MathSciNet  Google Scholar 

  67. Suresh Kumar, P.; Behera, H.; Nayak, J.; Naik, B.: A pragmatic ensemble learning approach for effective software effort estimation. Innov. Syst. Softw. Eng. 18(2), 283–299 (2022)

    Google Scholar 

  68. Hosni, M.; Idri, A.; Abran, A.: On the value of filter feature selection techniques in homogeneous ensembles effort estimation. J. Softw. Evolut. Process 33(6), e2343 (2021)

    Google Scholar 

  69. Koch, S.; Mitlöhner, J.: Software project effort estimation with voting rules. Decis. Support. Syst. 46(4), 895–901 (2009)

    Google Scholar 

  70. Ag, P.V.; Varadarajan, V.: Estimating software development efforts using a random forest-based stacked ensemble approach. Electronics 10(10), 1195 (2021)

    Google Scholar 

  71. Idri, A.; Abnane, I.; Abran, A.: Evaluating Pred (p) and standardized accuracy criteria in software development effort estimation. J. Softw. Evolut. Process 30(4), e1925 (2018)

    Google Scholar 

  72. Shepperd, M.; MacDonell, S.: Evaluating prediction systems in software project estimation. Inf. Softw. Technol. 54(8), 820–827 (2012)

    Google Scholar 

  73. Foss, T.; Stensrud, E.; Kitchenham, B.; Myrtveit, I.: A simulation study of the model evaluation criterion MMRE. IEEE Trans. Softw. Eng. 29(11), 985–995 (2003)

    Google Scholar 

  74. Cohen, J.: A power primer. Psychol. Bull. 112(1), 155–159 (1992)

    Google Scholar 

  75. Binkley, D.; Lawrie, D.: Information retrieval applications in software maintenance and evolution. In: Encyclopedia of Software Engineering, pp. 454–463 (2010)

  76. Sbalchiero, S.; Eder, M.: Topic modeling, long texts and the best number of topics. Some Problems and solutions. Qual. Quant. 54(4), 1095–1108 (2020)

    Google Scholar 

  77. Wang, J.; Lv, J.: Tag-informed collaborative topic modeling for cross domain recommendations. Knowl. Based Syst. 203, 106119 (2020)

    Google Scholar 

  78. Dieng, A.B.; Ruiz, F.J.; Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020)

    Google Scholar 

  79. Wallach, H., Mimno, D., McCallum, A.: Rethinking LDA: Why priors matter. In: Advances in Neural Information Processing Systems, vol. 22 (2009)

  80. Heinrich, G.: Parameter estimation for text analysis. Technical report (2005)

  81. Chen, H.; Damevski, K.; Shepherd, D.; Kraft, N.A.: Modeling hierarchical usage context for software exceptions based on interaction data. Autom. Softw. Eng. 26(4), 733–756 (2019)

    Google Scholar 

  82. Röder, M.; Both, A.; Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408 (2015)

  83. Rosner, F.; Hinneburg, A.; Röder, M.; Nettling, M.: Evaluating topic coherence measures (2014)

  84. Newman, M.: The no-miracles argument, reliabilism, and a methodological version of the generality problem. Synthese 177(1), 111–138 (2010)

    Google Scholar 

  85. Du, K.: Evaluating hyperparameter alpha of LDA topic modeling. In: DHd (2022)

  86. De Carvalho, H.D.P.; Fagundes, R.; Santos, W.: Extreme learning machine applied to software development effort estimation. IEEE Access 9, 92676–92687 (2021)

    Google Scholar 

  87. Assavakamhaenghan, N.; Tanaphantaruk, W.; Suwanworaboon, P.; Choetkiertikul, M.; Tuarob, S.: Quantifying effectiveness of team recommendation for collaborative software development. Autom. Softw. Eng. 29(2), 1–48 (2022)

    Google Scholar 

  88. Butt, S.A., et al.: A software-based cost estimation technique in scrum using a developer’s expertise. Adv. Eng. Softw. 171, 103159 (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anum Yasmin.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yasmin, A. Cost Adjustment for Software Crowdsourcing Tasks Using Ensemble Effort Estimation and Topic Modeling. Arab J Sci Eng 49, 12693–12728 (2024). https://doi.org/10.1007/s13369-024-08746-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-024-08746-8

Keywords

Navigation