Contributions for the structural testing of multithreaded programs: coverage criteria, testing tool, and experimental evaluation
- 144 Downloads
Abstract
Concurrent software testing is a challenging activity due to factors that are not present in sequential programs, such as communication, synchronization, and non-determinism, and that directly affect the testing process. When we consider multithreaded programs, new challenges for the testing activity are imposed. In the context of structural testing, an important problem raised is how to deal with the coverage of shared variables in order to establish the association between def-use of shared variables. This paper presents results related to the structural testing of multithreaded programs, including testing criteria for coverage testing, a supporting tool, called ValiPthread testing tool and results of an experimental study. This study was conducted to evaluate the cost, effectiveness, and strength of the testing criteria. Also, the study evaluates the contribution of these testing criteria to test specific aspects of multithreaded programs. The experimental results show evidence that the testing criteria present lower cost and higher effectiveness when revealing some kinds of defects, such as deadlock and critical region block. Also, compared to sequential testing criteria, the proposed criteria show that it is important to establish specific coverage testing for multithreaded programs.
Keywords
Multithreaded programs Shared memory PThreads Structural testing Coverage criteria Experimental evaluationNotes
Acknowledgements
The authors acknowledge the State of São Paulo Research Foundation - FAPESP, for the financial support (under processes no. 2010/04042-1, 2013/05046-9, 2013/01818-7 and 2015/23653-5) provided to this research.
References
- Badlaney, J., Ghatol, R., & Jadhwani, R. (2006). An introduction to data-flow testing. Tech. Rep. 22, North Carolina State University, Raleigh.Google Scholar
- Basili, V.R. (1996). The role of experimentation in software engineering: past, current, and future. In ICSE (pp. 442–449).Google Scholar
- Bradbury, J.S., & Jalbert, K. (2009). Defining a catalog of programming anti-patterns for concurrent java. In Proceedings of SPAQu’09 (pp. 6–11).Google Scholar
- Brito, M.A.S., do Rocio Senger de Souza, S., & de Souza, P.S.L. (2013). An empirical evaluation of the cost and effectiveness of structural testing criteria for concurrent programs. In ICCS (Vol. 18, pp. 250–259). Elsevier, Procedia.Google Scholar
- Brito, M.A.S., Santos, M., Souza, P.S.L., & Souza, S.R.S. (2015). Integration testing criteria for mobile robotic systems. In The 27th international conference on software engineering and knowledge engineering, SEKE 2015, July 6–8, 2015 (pp. 182–187). Pittsburgh: Wyndham Pittsburgh University Center.Google Scholar
- Carver, R.H., & Lei, Y. (2010). Distributed reachability testing of concurrent programs. Concurrency and Computation: Practice and Experience, 22(18), 2445–2466.CrossRefGoogle Scholar
- Carver, R.H., & Tai, K.C. (1991). Replay and testing for concurrent programs. IEEE Software, 8(2), 66–74.CrossRefGoogle Scholar
- Chung, C.M., Shih, T.K., Wang, Y.H., Lin, W.C., & Kou, Y.F. (1996). Task decomposition testing and metrics for concurrent programs. In Fifth international symposium on software reliability engineering (pp. 122–130).Google Scholar
- Cordeiro, L., & Fischer, B. (2011). Verifying multi-threaded software using smt-based context-bounded model checking. In 33rd international conference on software engineering, ICSE (pp. 331–340). New York: ACM.Google Scholar
- da Costa Araújo, I, da Silva, W.O., de Sousa Nunes, J.B., & Neto, F.O. (2016). Arrestt: a framework to create reproducible experiments to evaluate software testing techniques. In Proceedings of the 1st Brazilian symposium on systematic and automated software testing, SAST (pp. 1:1–1:10). New York: ACM. doi: 10.1145/2993288.2993303.
- Damodaran-Kamal, S.K., & Francioni, J.M. (1993). Nondeterminacy: testing and debugging in message passing parallel programs. In 3rd ACM/ONR workshop on parallel and distributed debugging (pp. 118–128). ACM.Google Scholar
- de Oliveira Neto, F.G., Torkar, R., & Machado, P.D.L. (2015). An initiative to improve reproducibility and empirical evaluation of software testing techniques. In Proceedings of the 37th international conference on software engineering - Volume 2 ICSE’15 (pp. 575–578). Piscataway: IEEE Press.Google Scholar
- Denaro, G., Pezzè, M, & Vivanti, M. (2013). Quantifying the complexity of dataflow testing. In 8th international workshop on automation of software test, AST 2013, May 18–19 (pp. 132–138). San Francisco.Google Scholar
- Dourado, G.G.M., de Souza, P.S.L., Prado, R.R., Batista, R.N., Souza, S.R.S., Estrella, J.C., Bruschi, S.M., & Lourenço, J (2016). A suite of Java message-passing benchmarks to support the validation of testing models, criteria and tools. In International conference on computational science 2016, ICCS 2016, 6–8 June, 2016, (pp. 2226–2230). San Diego, California.Google Scholar
- Edelstein, O., Farchi, E., Golden, E., Nir, Y., Ratsaby, G., & Ur, S. (2002). Contest: a users perspective. In 5th international conference on achieving quality in software. Venezia.Google Scholar
- Edelstein, O., Farchi, E., Goldin, E., Nir, Y., Ratsaby, G., & Ur, S. (2003). Framework for testing multi-threaded java programs. Concurrency and Computation: Practice and Experience, 15(3–5), 485– 499.CrossRefMATHGoogle Scholar
- Farchi, E., Nir, Y., & Ur, S. (2003). Concurrent bug patterns and how to test them. In 17th international parallel and distributed processing symposium (IPDPS 2003) - workshop on parallel and distributed systems: testing and debugging (pp. 286–293). Nice: IEEE Computer Society.Google Scholar
- Foreman, L.M., & Zweben, S.H. (1993). A study of the effectiveness of control and data flow testing strategies. Journal of Systems and Software, 21(3), 215–228.CrossRefGoogle Scholar
- Frankl, F.G., & Weyuker, E.J. (1986). Data flow testing in the presence of unexecutable paths. In Workshop on software testing (pp. 4–13). Banff.Google Scholar
- Friedman, M. (1937). The use of ranks to avoid the assumption of normality implicit in the analysis of variance. Journal of the American Statistical Association, 32(200), 675–701.CrossRefMATHGoogle Scholar
- Giacometti, C., Souza, S.R.S., & Souza, P.S.L. (2002). Teste de mutação para a validação de aplicações concorrentes usando PVM. REIC. Revista Eletrônica de Iniciação Científica, v. II, n. III.Google Scholar
- Grama, A., Karypis, G., Kumar, V., & Gupta, A. (2003). Introduction to parallel computing, 2nd Edn. Reading: Addison Wesley.Google Scholar
- Hong, S., Staats, M., Ahn, J., Kim, M., & Rothermel, G. (2013). The impact of concurrent coverage metrics on testing effectiveness. In 2013 IEEE 6th international conference on software testing, verification and validation (pp. 232–241).Google Scholar
- Hong, S., Staats, M., Ahn, J., Kim, M., & Rothermel, G. (2015). Are concurrency coverage metrics effective for testing: a comprehensive empirical investigation. Software Testing Verification and Reliability, 25(4), 334–370.CrossRefGoogle Scholar
- Höst, M, Regnell, B., & Wohlin, C. (2000). Using students as subjects—a comparative study of students and professionals in lead-time impact assessment. Empirical Software Engineering, 5(3), 201–214.CrossRefMATHGoogle Scholar
- Hutchins, M., Foster, H., Goradia, T., & Ostrand, T. (1994). Experiments of the effectiveness of dataflow and control flow based test adequacy criteria. In 16th international conference on software engineering, ICSE ’94 (pp. 191–200). Los Alamitos: IEEE.Google Scholar
- Jalbert, N., & Sen, K. (2010). A trace simplification technique for effective debugging of concurrent programs. In 18th ACM SIGSOFT international symposium on foundations of software engineering, FSE ’10 (pp. 57–66). New York: ACM.Google Scholar
- Lamport, L. (1978). Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7), 558–565.CrossRefMATHGoogle Scholar
- Lei, Y., & Carver, R.H. (2006). Reachability testing of concurrent programs. IEEE Transactions on Software Engineering, 32(6), 382–403.CrossRefGoogle Scholar
- Li, N., Praphamontripong, U., & Offutt, J. (2009). An experimental comparison of four unit test criteria: Mutation, edge-pair, all-uses and prime path coverage. In Second international conference on software testing verification and validation, Denver, Colorado, USA, April 1-4, 2009, Workshops Proceedings (pp. 220–229). Los Alamitos: IEEE Computer Society.Google Scholar
- Lu, S., Park, S., Seo, E., & Zhou, Y. (2008). Learning from mistakes: a comprehensive study on real-world concurrency bug characteristics. SIGOPS Operating Systems Review, 42, 329–339.CrossRefGoogle Scholar
- Lu, S., Park, S., & Zhou, Y. (2012). Finding atomicity-violation bugs through unserializable interleaving testing. IEEE Transactions on Software Engineering, 38(4), 844–860.CrossRefGoogle Scholar
- Mathur, A.P., & Wong, E.W. (1993). An empirical comparison of mutation and data flow based test adequacy criteria. Journal of Software Testing, Verification, and Reliability, 4(1), 9–31.CrossRefGoogle Scholar
- Mathur, A.P., & Wong, W.E. (1994). An empirical comparison of data flow and mutation-based test adequacy criteria. The Journal of Software Testing, Verification and Reliability, 4(1), 9–31.CrossRefGoogle Scholar
- McCabe, T. (1976). A complexity measure. IEEE Transactions on Software Engineering, 2, 308–320.MathSciNetCrossRefMATHGoogle Scholar
- Melo, S.M., Souza, S.R.S., & L, S.P.S. (2012). Structural testing for multithreaded programs: an experimental evaluation of the cost, strength and effectiveness. In: 24th international conference on software engineering & knowledge engineering (SEKE’2012), July, 2012 (Vol. 1–3, pp. 476–479). San Francisco Bay.Google Scholar
- Melo, S.M., Souza, S.R.S., Silva, R.A., & Souza, P.S.L. (2015). Concurrent software testing in practice: a catalog of tools. In Proceedings of the 6th international workshop on automating test case design, selection and evaluation, A-TEST 2015 (pp. 31–40). New York: ACM.Google Scholar
- Melo, S.M., Souza, P.S.L., & Souza, S.R.S. (2016). Towards an empirical study design for concurrent software testing. In Fourth international workshop on software engineering for high performance computing in computational science and engineering, SC ’16 (p. 49). Salt Lake City: IEEE Press.Google Scholar
- Mühlenfeld, A, & Wotawa, F. (2007). Fault detection in multi-threaded C++ server applications. Electronic Notes in Theoretical Computer Science, 174(9), 5–22.CrossRefGoogle Scholar
- Musuvathi, M., Qadeer, S., & Ball, T. (2007). Chess: a systematic testing tool for concurrent software. Tech. Rep. MSR-TR-2007-149, Microsoft Research.Google Scholar
- Ntafos, S.C. (1988). A comparison of some structural testing strategies. IEEE Transactions on Software Engineering, 14(6), 868–873.CrossRefGoogle Scholar
- Offutt, A.J., Pan, J., Tewary, K., & Zhang, T. (1996). An experimental evaluation of data flow and mutation testing. Software Practice and Experience, 26(2), 165–176.CrossRefGoogle Scholar
- Rapps, S., & Weyuker, E.J. (1985). Selecting software test data using data flow information. IEEE Transactions on Software Engineering, 11(4), 367–375. doi: 10.1109/TSE.1985.232226.CrossRefMATHGoogle Scholar
- Rungta, N., & Mercer, E.G. (2009). A meta heuristic for effectively detecting concurrency errors. Lecture Notes in Computer Science LNCS, 5394, 23–37.CrossRefGoogle Scholar
- Sarmanho, F.S., Souza, P.S., Souza, S.R., & Simão, AS (2008). Structural testing for semaphore-based multithread programs. In Proceedings of the 8th international conference on computational science, Part I, ICCS ’08 (pp. 337–346). Berlin: Springer.Google Scholar
- Silva, R.A., do Rocio Senger de Souza, S., & de Souza, P.S.L. (2012). Mutation operators for concurrent programs in MPI. In 13th Latin American test workshop, LATW 2012, April 10–13, 2012 (pp. 1–6). Quito.Google Scholar
- Simao, A.S., Vincenzi, A.M.R., Maldonado, J.C., & Santana, A.C.L. (2003). A language for the description of program instrumentation and the automatic generation of instrumenters. CLEI Electronic Journal, 6(1).Google Scholar
- Society, I.C., Bourque, P., & Fairley, R.E. (2014). Guide to the software engineering body of knowledge (SWEBOK(R)): Version 3.0, 3rd Edn. Los Alamitos: IEEE Computer Society Press.Google Scholar
- Souza, S., Vergilio, S., Souza Pao, A.S., Bliscosque, T., Lima, A., & Hausen, A. (2005). Valipar: a testing tool for message-passing parallel programs. In International conference on software knowledge and software engineering (SEKE05) (pp. 386–391). Taipei-Taiwan.Google Scholar
- Souza, S., Sugeta, T., Fabbri, S., Masiero, P., & Maldonado, J. (2007). Coverage testing criteria for statecharts specifications validation. Software, Testing, Verification and Reliability Submitted.Google Scholar
- Souza, P.L., Sawabe, E.T., Simão, AS, Vergilio, S.R., & Souza, S.R.S. (2008a). ValiPVM—a graphical tool for structural testing of PVM programs. In Proceedings of the 15th European PVM/MPI users’ group meeting on recent advances in parallel virtual machine and message passing interface (pp. 257–264). Berlin: Springer.Google Scholar
- Souza, S.R.S., Vergilio, S.R., Souza, P.S.L., Simão, AS, & Hausen, A.C. (2008b). Structural testing criteria for message-passing parallel programs. Concurrency and Computation: Practice and Experience, 20, 1893–1916.Google Scholar
- Souza, S.R.S., Prado, M.P., Barbosa, E.F., & Maldonado, J.C. (2012b). An experimental study to evaluate the impact of the programming paradigm in the testing activity. CLEI Electronic Journal (Online), 15(1), 4–4.Google Scholar
- Souza, P.S.L., Souza, S.R.S., & Zaluska, E. (2012a). Structural testing for message-passing concurrent programs: an extended test model. Concurrency and Computation: Practice and Experience, 26(1), 21–50.Google Scholar
- Souza, P.S., Souza, S.S., Rocha, M.G., Prado, R.R., & Batista, R.N. (2013). Data flow testing in concurrent programs with message passing and shared memory paradigms. Procedia Computer Science, 18, 149–158.CrossRefGoogle Scholar
- Souza, S.R.S., Souza, P.S.L., Brito, M.A.S., da Silva Simão, A, & Zaluska, E. (2015a). Empirical evaluation of a new composite approach to the coverage criteria and reachability testing of concurrent programs. Software Testing, Verification and Reliability, 25(3), 310–332.Google Scholar
- Souza, S.R.S., Souza, P.S.L., Brito, M.A.S., Simao, A.S., & Zaluska, E.J. (2015b). Empirical evaluation of a new composite approach to the coverage criteria and reachability testing of concurrent programs. Software Testing, Verification and Reliability, 25(3), 310–332.Google Scholar
- Takahashi, J., Kojima, H., & Furukawa, Z. (2008). Coverage based testing for concurrent software. In 28th international conference on distributed computing systems workshops. ICDCS’ 08 (pp. 533–538). Beijing: IEEE.Google Scholar
- Tanenbaum, A.S. (1995). Distributed operating systems. Upper Saddle River: Prentice-Hall, Inc.MATHGoogle Scholar
- Taylor, R.N., Levine, D.L., & Kelly, C. (1992). Structural testing of concurrent programs. IEEE Transaction Software Engineering, 18(3), 206–215.CrossRefGoogle Scholar
- Valgrind-Developers (2014). Valgrind-3.6.1. http://valgrind.org/. Accessed 16 December 2014.
- Vegas, S., & Basili, V. (2005). A characterisation schema for software testing techniques. Empirical Software Engineering, 10(4), 437–466. doi: 10.1007/s10664-005-3862-1.CrossRefGoogle Scholar
- Vergilio, S.R., Souza, S.R.S., & Souza, P.S.L. (2005). Coverage testing criteria for message-passing parallel programs. In LATW2005 - 6th IEEE Latin-American test workshop (pp. 161–166). Salvador.Google Scholar
- Vergilio, S.R., Maldonado, J.C., & Jino, M. (2006). Infeasible paths in the context of data flow based testing criteria: identification, classification and prediction. Journal of the Brazilian Computer Society, 12(1), 73–88.CrossRefGoogle Scholar
- Vos, T.E.J., Marín, B, Escalona, M.J., & Marchetto, A. (2012). A methodological framework for evaluating software testing techniques and tools. In 2012 12th international conference on quality software (pp. 230–239). doi: 10.1109/QSIC.2012.16.
- Wang, C., Chaudhuri, S., Gupta, A., & Yang, Y. (2009). Symbolic pruning of concurrent program executions. In 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering, ESEC/FSE ’09 (pp. 23–32). New York: ACM.Google Scholar
- Wang, H., Liu, T., Guan, X., Shen, C., Zheng, Q., & Yang, Z. (2017). Dependence guided symbolic execution. IEEE Transactions on Software Engineering, 43(3), 252–271.CrossRefGoogle Scholar
- Wesonga, S., Mercer, E.G., & Rungta, N. (2011). Guided test visualization: making sense of errors in concurrent programs. In 26th IEEE/ACM international conference on automated software engineering, ASE (pp. 624–627). Washington, DC: IEEE.Google Scholar
- Weyuker, E.J. (1990). The cost of data flow testing: an empirical study. IEEE Transactions on Software Engineering, 16(2), 121–128.MathSciNetCrossRefGoogle Scholar
- Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1, 80–83.MathSciNetCrossRefGoogle Scholar
- Wohlin, C., Runeson, P., Höst, M, Ohlsson, M.C., Regnell, B., & Wesslén, A. (2000). Experimentation in software engineering: an introduction. Norwell: Kluwer Academic Publishers.CrossRefMATHGoogle Scholar
- Wong, W.E., Lei, Y., & Ma, X. (2005). Effective generation of test sequences for structural testing of concurrent programs. In 10th IEEE international conference on engineering of complex computer systems (ICECCS’05) (pp. 539–548). Los Alamitos: IEEE. doi: 10.1109/ICECCS.2005.37.
- Xiao, X., Xie, T., Tillmann, N., & Halleux, J. (2011). Precise identification of problems for structural test generation. In Proceedings of the 33rd international conference on software engineering (ICSE, 2011) (pp. 611–620). Honolulu: IEEE.Google Scholar
- Yang, Y. (2014). Inspect: a framework for dynamic verification of multithreaded C programs. http://www.cs.utah.edu/yuyang/inspect/. Accessed 16 December 2014.
- Yang, R.D., & Chung, C.G. (1992). Path analysis testing of concurrent programs. Information and Software Technology, 34, 43–56.CrossRefGoogle Scholar
- Yang, C.S., & Pollock, L.L. (1997). The challenges in automated testing of multithreaded programs. In 14th International conference on testing computer software (pp. 157–166).Google Scholar
- Yang, C.S.D., & Pollock, L.L. (2003). All-uses testing of shared memory parallel programs. Software Testing, Verification and Reliability Journal, 13, 3–24.CrossRefGoogle Scholar
- Yang, C.S.D., Souter, A.L., & Pollock, L.L. (1998). All-du-path coverage for parallel programs. SIGSOFT Software Engineering Notes, 23, 153–162.CrossRefGoogle Scholar
- Yang, Y., Chen, X., Gopalakrishnan, G., & Kirby, R.M. (2008). Efficient stateful dynamic partial order reduction. In K. Havelund, R. Majumdar, J. Palsberg (Eds.), SPIN, Lecture notes in computer science (Vol. 5156, pp. 288–305). Springer.Google Scholar
- Yastrebenetsky, P., & Trakhtenbrot, M. (2011). Analysis of applicability for synchronization complexity metric. In 18th IEEE international conference and workshops on engineering of computer-based systems, ECBS ’11 (pp. 24–33). Washington, DC: IEEE.Google Scholar
- Zhu, H. (1996). A formal analysis of the subsume relation between software test adequacy criteria. IEEE Transactions on Software Engineering, 22, 248–255.CrossRefGoogle Scholar