Abstract
Crowdsourced software development (CSSD) is a fast-growing field among software practitioners and researchers from the last two decades. Despite being a favorable environment, no intelligent mechanism exists to assign price to CSSD tasks. Software development effort estimation (SDEE) on the other hand is already an established field in traditional software engineering. SDEE is largely facilitated by machine learning (ML), particularly, ML-based ensemble effort estimation (EEE) which targets accurate estimate by avoiding biases of single ML model. This accuracy of EEE can be exploited for CSSD platforms to establish intelligent cost assignment mechanism. This study aims to integrate EEE with CSSD platform to provide justified costing solution for crowdsourced tasks. Effort-based cost estimation model is proposed, implementing EEE to predict task’s effort along with natural language processing (NLP) analysis on task’s textual description to assign effort-based cost. TopCoder is selected as targeted CSSD platform, and the proposed scheme is implemented on TopCoder QA category comprising software testing tasks. Ensemble prediction is incorporated via random forest, support vector machine and neural network as base learners. LDA topic modeling is utilized for NLP analysis on the textual aspects of CSSD task, with a specific emphasis on the testing and technology factors. Effort estimation results confirm that EEE models, particularly stacking and weighted ensemble, surpass their base learners with 50% overall increased accuracy. Moreover, R2, log-likelihood and topic quality measures confirm considerable LDA model significance. Findings confirmed that cost adjustment achieved from EEE and NLP defines acceptable price range, covering major testing aspects.
Similar content being viewed by others
References
Howe, J.: The rise of crowdsourcing. Wired Mag. 14(6), 1–4 (2006)
Stol, K.-J.; Fitzgerald, B.: Two's company, three's a crowd: a case study of crowdsourcing software development. In: Proceedings of the 36th International Conference on Software Engineering, pp. 187–198 (2014)
Sarı, A.; Tosun, A.; Alptekin, G.I.: A systematic literature review on crowdsourcing in software engineering. J. Syst. Softw. 153, 200–219 (2019)
Mao, K.; Yang, Y.; Li, M.; Harman, M.: Pricing crowdsourcing-based software development tasks. In: 2013 35th International Conference on Software Engineering (ICSE), pp. 1205–1208. IEEE (2013)
Gonen, R.; Raban, D.; Brady, C.; Mazor, R.: Increased efficiency through pricing in online labor markets. J. Electron. Commer. Res. 15(1), 58 (2014)
Singer, Y.: Budget feasible mechanisms. In: 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 765–774. IEEE (2010)
Singer, Y.; Manas, M.: Pricing tasks in online labor markets. In: Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence (2011)
Wang, L.; Wang, Y.: Do extra dollars paid-off? An exploratory study on topcoder. In: Proceedings of the 5th International Workshop on Crowd Sourcing in Software Engineering, pp. 21–27 (2018)
Saremi, L.; Saremi, M.R.; Martinez-Mejorado, D.: How much should I pay? An empirical analysis on monetary prize in topcoder. In: International Conference on Human-Computer Interaction, pp. 202–208. Springer (2020)
Li, B.; Wu, W.; Hu, Z.: Evaluation of software quality for competition-based software crowdsourcing projects. In: Proceedings of the 2018 7th International Conference on Software and Computer Applications, pp. 102–109 (2018)
Alelyani, T.; Mao, K.; Yang, Y.: Context-centric pricing: early pricing models for software crowdsourcing tasks. In: Proceedings of the 13th International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 63–72 (2017)
Asghar, M.Z., et al.: Senti-eSystem: a sentiment-based eSystem-using hybridized fuzzy and deep neural network for measuring customer satisfaction. Softw. Pract. Exp. 51(3), 571–594 (2021)
Fu, Y.; Shen, B.; Chen, Y.; Huang, L.J.A.S.C.: TDMatcher: a topic-based approach to task-developer matching with predictive intelligence for recommendation. Appl. Soft Comput. 110, 107720 (2021)
Wen, J.; Li, S.; Lin, Z.; Hu, Y.; Huang, C.: Systematic literature review of machine learning based software development effort estimation models. Inf. Softw. Technol. 54(1), 41–59 (2012)
Brooks, F.P.: The Mythical Man-Month: Essays on Software Engineering. Pearson Education, New York (1995)
Sheoraj, Y.; Sungkur, R.K.: Using AI to develop a framework to prevent employees from missing project deadlines in software projects-case study of a global human capital management (HCM) software company. Adv. Eng. Softw. 170, 103143 (2022)
Minku, L.L.; Yao, X.: Software effort estimation as a multiobjective learning problem. ACM Trans. Softw. Eng. Methodol. 22(4), 1–32 (2013)
Debarcelos, I.F.; Silva, J.D.S.; Santanna, N.: An investigation of artificial neural networks based prediction systems in software project management. J. Syst. Softw. 81(3), 356–367 (2008)
Hughes, R.T.: Expert judgement as an estimating method. Inf. Softw. Technol. 38(2), 67–75 (1996)
Pospieszny, P.; Beata, C.-C.; Andrzej, K.: An effective approach for software project effort and duration estimation with machine learning algorithms. J. Syst. Softw. 137, 184–196 (2018)
Silhavy, R.; Silhavy, P.; Prokopova, Z.J.J.O.S.: Analysis and selection of a regression model for the use case points method using a stepwise approach. J. Syst. Softw. 125, 1–14 (2017)
Kocaguneli, E.; Menzies, T.; Keung, J.W.: On the value of ensemble effort estimation. J. Mag. 38(6), 1403–1416 (2011)
Mahmood, Y.; Kama, N.; Azmi, A.; Khan, A.S.; Ali, M.J.S.P.: Software effort estimation accuracy prediction of machine learning techniques: a systematic performance evaluation. Softw. Pract. Exp. 52(1), 39–65 (2022)
Idri, A.; Hosni, M.; Abran, A.: Systematic literature review of ensemble effort estimation. J. Syst. Softw. 118, 151–175 (2016)
Zulfiqar, M.; Malik, M.N.; Khan, H.H.J.I.A.: Microtasking activities in crowdsourced software development: a systematic literature review. IEEE Access 10, 24721–24737 (2022)
Vianna, F.R.P.M.; Graeml, A.R.; Peinado, J.J.B.-B.A.R.: An aggregate taxonomy for crowdsourcing platforms, their characteristics, and intents. BAR Braz. Adm. Rev. 19, 1 (2022)
Phannachitta, P.; Matsumoto, K.: Model-based software effort estimation—a robust comparison of 14 algorithms widely used in the data science community. Int. J. Innov. Comput. Inf. Control ICIC Int. 15, 569–589 (2019)
Amasaki, S.: A comparative study on linear combination rules for ensemble effort estimation. In: 2017 43rd Euromicro Conference on Software Engineering and Advanced Applications (SEAA), pp 104–107. IEEE (2017)
Idri, A.; Hosni, M.; Abran, A.: Improved estimation of software development effort using classical and fuzzy analogy ensembles. Appl. Soft Comput. 49, 990–1019 (2016)
Song, L.; Minku, L.L.; Yao, X.: Software effort interval prediction via Bayesian inference and synthetic bootstrap resampling. ACM Trans. Softw. Eng. Methodol. 28(1), 1–46 (2019)
Shukla, S.; Kumar, S.; Bal, P.: Analyzing effect of ensemble models on multi-layer perceptron network for software effort estimation. In: Presented at the IEEE World Congress on Services (SERVICES) (2019)
Shah, M.A.; Jawawi, D.N.A.; Isa, M.A.; Younas, M.; Abdelmaboud, A.; Sholichin, F.: Ensembling artificial bee colony with analogy-based estimation to improve software development effort prediction. IEEE Access 8, 58402–58415 (2020)
Hussain, A.; Raja, M.; Vellaisamy, P.; Krishnan, S.; Rajendran, L.: Enhanced framework for ensemble effort estimation by using recursive-based classification. IET Softw. 15(3), 230–238 (2021)
Hosni, M.; Idri, A.; Abran, A.: Investigating heterogeneous ensembles with filter feature selection for software effort estimation. In: Proceedings of the 27th International Workshop on Software Measurement and 12th International Conference on Software Process and Product Measurement, pp. 207–220 (2017)
Mohamed, H.; Ali, I.; Bou, N.A.; Alain, A.: Heterogeneous ensembles for software development effort estimation. In: 2016 3rd International Conference on Soft Computing and Machine Intelligence (ISCMI), pp. 174–178. IEEE (2016)
Mohamed, H.; Ali, I.; Alain, A.; Bou, N.A.: On the value of parameter tuning in heterogeneous ensembles effort estimation. Soft. Comput. 22(18), 5977–6010 (2018)
Palaniswamy, S.K.; Venkatesan, R.: Hyperparameters tuning of ensemble model for software effort estimation. J. Amb. Intell. Human. Comput. 12(6), 6579–6589 (2021)
Rhmann, W.; Pandey, B.; Ansari, G.A.: Software effort estimation using ensemble of hybrid search-based algorithms based on metaheuristic algorithms. Innov. Syst. Softw. Eng. 18, 1–11 (2021)
Azzeh, M.; Nassif, A.B.: Analogy-based effort estimation: a new method to discover set of analogies from dataset characteristics. IET Softw. 9(2), 39–50 (2015)
Shukla, S.; Kumar, S.; Bal, P.R.: Analyzing effect of ensemble models on multi-layer perceptron network for software effort estimation. In: 2019 IEEE World Congress on Services (SERVICES), 2642, pp. 386–387. IEEE (2019)
Elish, M.O.; Helmy, T.; Hussain, M.I.: Empirical study of homogeneous and heterogeneous ensemble models for software development effort estimation. Math. Prob. Eng. 2013, 1–21 (2013)
Kocaguneli, E.; Menzies, T.: Software effort models should be assessed via leave-one-out validation. J. Syst. Softw. 86(7), 1879–1890 (2013)
Cabral, J.T.H.D.A.; Oliveira, A.L.: Ensemble effort estimation using dynamic selection. J. Syst. Softw. 175, 110904 (2021)
Rhmann, W.; Pandey, B.; Ansari, G.A.: Software effort estimation using ensemble of hybrid search-based algorithms based on metaheuristic algorithms. Innov. Syst. Softw. Eng. 18(2), 309–319 (2022)
Yang, Y.; Karim, M.R.; Saremi, R.; Ruhe, G.: Who should take this task? Dynamic decision support for crowd workers. In: Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pp. 1–10 (2016)
Illahi, I.; Liu, H.; Umer, Q.; Niu, N.: Machine learning based success prediction for crowdsourcing software projects. J. Syst. Softw. 178, 110965 (2021)
Blei, D.M.; Ng, A.Y.; Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(234), 993–1022 (2003)
Jeong, J.; Kim, N.: Does sentiment help requirement engineering: Exploring sentiments in user comments to discover informative comments. Autom. Softw. Eng. 28(2), 1–26 (2021)
Clark, B.; Devnani-Chulani, S.; Boehm, B.: Calibrating the COCOMO II post-architecture model. In: Proceedings of the 20th International Conference on Software Engineering, pp. 477–480. IEEE (1998)
Jiang, Z.; Naudé, P.; Jiang, B.: The effects of software size on development effort and software quality. Int. J. Comput. Inf. Sci. Eng. 1(4), 230–234 (2007)
Qi, F.; Jing, X.-Y.; Zhu, X.; Xie, X.; Xu, B.; Ying, S.: Software effort estimation based on open source projects: case study of Github. Inf. Softw. Technol. 92, 145–157 (2017)
Sheta, A.F.; Aljahdali, S.: Software effort estimation inspired by COCOMO and FP models: A fuzzy logic approach. Int. J. Adv. Comput. Sci. Appl. 4, 11 (2013)
dos Santos, E.W.; Nunes, I.: Investigating the effectiveness of peer code review in distributed software development based on objective and subjective data. J. Softw. Eng. Res. Dev. 6(1), 1–31 (2018)
Sakhrawi, Z.; Sellami, A.; Bouassida, N.: Software enhancement effort estimation using correlation-based feature selection and stacking ensemble method. Clust. Comput. 25(4), 2779–2792 (2022)
Khatun, N.: Applications of normality test in statistical analysis. Open J. Stat. 11(01), 113 (2021)
Atkinson, A.C.; Riani, M.; Corbellini, A.: The box–cox transformation: Review and extensions. Stat. Sci. 36(2), 239–255 (2021)
Fouedjio, F.: Classification random forest with exact conditioning for spatial prediction of categorical variables. Artif. Intell. Geosci. 2, 82–95 (2021)
Au, T.C.: Random forests, decision trees, and categorical predictors: the" absent levels" problem. J. Mach. Learn. Res. 19(1), 1737–1766 (2018)
Piccialli, V.; Sciandrone, M.: Nonlinear optimization and support vector machines. Ann. Oper. Res. 314, 1–33 (2022)
Landi, A.; Piaggi, P.; Laurino, M.; Menicucci, D.: Artificial neural networks for nonlinear regression and classification. In: 2010 10th International Conference on Intelligent Systems Design and Applications, pp. 115–120. IEEE (2010)
Chen, D.; Hu, F.; Nian, G.; Yang, T.: Deep residual learning for nonlinear regression. Entropy 22(2), 193 (2020)
Villalobos-Arias, L.; Quesada-López, C.; Guevara-Coto, J.; Martínez, A.; Jenkins, M.: Evaluating hyper-parameter tuning using random search in support vector machines for software effort estimation. In: Proceedings of the 16th ACM International Conference on Predictive Models and Data Analytics in Software Engineering, pp. 31–40 (2020)
Song, L.; Langfelder, P.; Horvath, S.: Random generalized linear model: a highly accurate and interpretable ensemble predictor. BMC Bioinf. 14(1), 1–22 (2013)
Mustapha, H.; Abdelwahed, N.: Investigating the use of random forest in software effort estimation. Procedia Comput. Sci. 148, 343–352 (2019)
Alsghaier, H.; Akour, M.: Software fault prediction using particle swarm algorithm with genetic algorithm and support vector machine classifier. Softw. Pract. Exp. 50(4), 407–427 (2020)
Kumar, P.S.; Behera, H.S.; Kumari, A.; Nayak, J.; Naik, B.: Advancement from neural networks to deep learning in software effort estimation: perspective of two decades. Comput. Sci. Rev. 38, 100288 (2020)
Suresh Kumar, P.; Behera, H.; Nayak, J.; Naik, B.: A pragmatic ensemble learning approach for effective software effort estimation. Innov. Syst. Softw. Eng. 18(2), 283–299 (2022)
Hosni, M.; Idri, A.; Abran, A.: On the value of filter feature selection techniques in homogeneous ensembles effort estimation. J. Softw. Evolut. Process 33(6), e2343 (2021)
Koch, S.; Mitlöhner, J.: Software project effort estimation with voting rules. Decis. Support. Syst. 46(4), 895–901 (2009)
Ag, P.V.; Varadarajan, V.: Estimating software development efforts using a random forest-based stacked ensemble approach. Electronics 10(10), 1195 (2021)
Idri, A.; Abnane, I.; Abran, A.: Evaluating Pred (p) and standardized accuracy criteria in software development effort estimation. J. Softw. Evolut. Process 30(4), e1925 (2018)
Shepperd, M.; MacDonell, S.: Evaluating prediction systems in software project estimation. Inf. Softw. Technol. 54(8), 820–827 (2012)
Foss, T.; Stensrud, E.; Kitchenham, B.; Myrtveit, I.: A simulation study of the model evaluation criterion MMRE. IEEE Trans. Softw. Eng. 29(11), 985–995 (2003)
Cohen, J.: A power primer. Psychol. Bull. 112(1), 155–159 (1992)
Binkley, D.; Lawrie, D.: Information retrieval applications in software maintenance and evolution. In: Encyclopedia of Software Engineering, pp. 454–463 (2010)
Sbalchiero, S.; Eder, M.: Topic modeling, long texts and the best number of topics. Some Problems and solutions. Qual. Quant. 54(4), 1095–1108 (2020)
Wang, J.; Lv, J.: Tag-informed collaborative topic modeling for cross domain recommendations. Knowl. Based Syst. 203, 106119 (2020)
Dieng, A.B.; Ruiz, F.J.; Blei, D.M.: Topic modeling in embedding spaces. Trans. Assoc. Comput. Linguist. 8, 439–453 (2020)
Wallach, H., Mimno, D., McCallum, A.: Rethinking LDA: Why priors matter. In: Advances in Neural Information Processing Systems, vol. 22 (2009)
Heinrich, G.: Parameter estimation for text analysis. Technical report (2005)
Chen, H.; Damevski, K.; Shepherd, D.; Kraft, N.A.: Modeling hierarchical usage context for software exceptions based on interaction data. Autom. Softw. Eng. 26(4), 733–756 (2019)
Röder, M.; Both, A.; Hinneburg, A.: Exploring the space of topic coherence measures. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, pp. 399–408 (2015)
Rosner, F.; Hinneburg, A.; Röder, M.; Nettling, M.: Evaluating topic coherence measures (2014)
Newman, M.: The no-miracles argument, reliabilism, and a methodological version of the generality problem. Synthese 177(1), 111–138 (2010)
Du, K.: Evaluating hyperparameter alpha of LDA topic modeling. In: DHd (2022)
De Carvalho, H.D.P.; Fagundes, R.; Santos, W.: Extreme learning machine applied to software development effort estimation. IEEE Access 9, 92676–92687 (2021)
Assavakamhaenghan, N.; Tanaphantaruk, W.; Suwanworaboon, P.; Choetkiertikul, M.; Tuarob, S.: Quantifying effectiveness of team recommendation for collaborative software development. Autom. Softw. Eng. 29(2), 1–48 (2022)
Butt, S.A., et al.: A software-based cost estimation technique in scrum using a developer’s expertise. Adv. Eng. Softw. 171, 103159 (2022)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yasmin, A. Cost Adjustment for Software Crowdsourcing Tasks Using Ensemble Effort Estimation and Topic Modeling. Arab J Sci Eng 49, 12693–12728 (2024). https://doi.org/10.1007/s13369-024-08746-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13369-024-08746-8