Workflows for quantitative data analysis in the social sciences

  • Kenneth J. Turner
  • Paul S. Lambert
Regular Paper


The background is given as to how statistical analysis is used by quantitative social scientists. Developing statistical analyses requires substantial effort, yet there are important limitations in current practice. This has motivated the authors to create a more systematic and effective methodology with supporting tools. The approach to modelling quantitative data analysis in the social sciences is presented. Analysis scripts are treated abstractly as mathematical functions and concretely as web services. This allows individual scripts to be combined into high-level workflows. A comprehensive set of tools allows workflows to be defined, automatically validated and verified, and automatically implemented. The workflows expose opportunities for parallel execution, can define support for proper fault handling, and can be realised by non-technical users. Services, workflows and datasets can also be readily shared. The approach is illustrated with a realistic case study that analyses occupational position in relation to health.


e-Social science Quantitative data analysis Scientific workflow Service-oriented architecture Statistical analysis 



The work reported in this paper was conducted on the Dames project, which was led by the second author. Dames was funded by the Economic and Social Research Council under grant number RES-149-25-1066. The authors are grateful to their colleagues on Dames for collaboration on the general topic of techniques for e-social science. Support of social science workflows was jointly developed by Koon Leai, Larry Tan and the first author as part of the Dames project. Guy C. Warner built the occupational and educational portals for Dames, and created the server infrastructure needed for the work reported here. He also provided technical advice on using workflows in conjunction with the Dames servers.


  1. 1.
    Arkin, A., Askary, S., Bloch, B., Curbera, F., Goland, Y., Kartha, N., Lie, C.K., Thatte, S., Yendluri, P., Yiu, A. (eds.) Web Services Business Process Execution Language. Version 2.0. Organization for The Advancement of Structured Information Standards, Billerica (2007)Google Scholar
  2. 2.
    Armbrust, M., Fox, A., Griffith, R., Joseph, A.D., Katz, R., Konwinski, A., Lee, G., Paterson, D., Rabkin, A., Stoica, I., Zaharaia, M.: A view of cloud computing. Commun. ACM 53(4), 50–58 (Apr. 2010)Google Scholar
  3. 3.
    Bethlehem, J.: Surveys without questions. In: De Leeuw, E., Hox, J., Dillman, D.A. (eds.) International Handbook of Survey Methodology, chap. 26, pp. 500–511. Psychology Press, London (2008)Google Scholar
  4. 4.
    Blank, G., Rasmussen, K.B.: The Data Documentation Initiative: the value and significance of a worldwide standard. Soc. Sci. Comput. Rev. 22(3) (2004)Google Scholar
  5. 5.
    Bradfield, J., Stirling, C.: Modal mu-calculi. In: Blackburn, P., van Benthem, J., Wolter, F. (eds.) Handbook of Modal Logic. Elsevier Science Publishers, Amsterdam (2007)Google Scholar
  6. 6.
    Browne, W.J., Cameron, B., Charlton, C.M.J., Michaelides, D.T., Parker, R.M.A., Szmaragd, C., Yang, H., Zhang, Z.: A beginner’s guide to Stat-JR. University of Bristol, Centre for Multilevel Modelling, Bristol (2012)Google Scholar
  7. 7.
    Butler, M., Ferreira, C., Ng, M.Y.: Specifying and verifying web transactions. Univ. Comput. Sci. 11(5), 712–743 (2005)Google Scholar
  8. 8.
    Chirichiello, A., Salaün, G., Encoding abstract descriptions into executable web services: towards a formal development. In: Proc. Web Intelligence. Institution of Electrical and Electronic Engineers Press, New York (2005)Google Scholar
  9. 9.
    de Roure, D.C., Goble, C.A., Stevens, R.: PX: a system extracting programs from proofs. In: Fox, G., Chiu, K., Buyya, R. (eds.) Proc. Int. Conf. on 3rd e-Science and Grid Computing, pp. 603–610. Institution of Electrical and Electronic Engineers Press, New York (2007)Google Scholar
  10. 10.
    Economic and Social Data Service and University of Manchester. Linking international macro and micro data (2012).
  11. 11.
    Ferrara, A.: Web services: a process algebra approach. In: Proc. 2nd International Conference on Service-Oriented Computing, pp. 242–251. ACM Press, New York (2004)Google Scholar
  12. 12.
    Foster, H.: A rigorous approach to engineering web service compositions. Ph.D. thesis, Imperial College, London (2006)Google Scholar
  13. 13.
    Foster, I.: What is the grid? A three point checklist. Grid Today 1(6) (2002)Google Scholar
  14. 14.
    Freese, J.: Replication standards for quantitative social science: why not sociology? Sociol. Methods Res. 36(2), 153–171 (2007)CrossRefMathSciNetGoogle Scholar
  15. 15.
    Fu, X., Bultan, T., Su, J.: Analysis of interacting BPEL web services. In: Proc. 13th. International World Wide Web Conference, pp. 621–630. ACM Press, New York (2004)Google Scholar
  16. 16.
    Ghanem, M., Guo, Y., Rowe, A., Wendel, P.: Grid-based knowledge discovery services for high throughput informatics. In: Proc. 11th Int. Symp. on High Performance Distributed Computing, pp. 198–212. Institution of Electrical and Electronic Engineers Press, New York (2002)Google Scholar
  17. 17.
    Hey, T., Tansley, S.: In: Tolle, K. (ed.) The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond (2009)Google Scholar
  18. 18.
    ISO/IEC. Information processing systems—open systems interconnection—LOTOS—a formal description technique based on the temporal ordering of observational behaviour. ISO/IEC 8807. International Organization for Standardization, Geneva (1989)Google Scholar
  19. 19.
    Kaveh, N., Emmerich, W.: Validating distributed object and component designs. In: Bernardo, M., Inverardi, P. (eds.) Formal Methods for Software Architecture. Lecture Notes in Computer Science, vol. 2804, pp. 63–91. Springer, Berlin (2003)CrossRefGoogle Scholar
  20. 20.
    Lambert, P.S., Bihagen, E.: Stratification research and occupation-based classifications. In: Lambert, P.S., Connelly, R., Blackburn, R.M., Gayle, V. (eds.) Social Stratification: Trends and Processes, chap. 2, pp. 13–28. Ashgate, Aldershot (2012)Google Scholar
  21. 21.
    Li, J., Zhu, H., He, J.: Specifying and verifying web transactions. In: Suzuki, K., Higashino, T., Yasumoto, K., El-Fakih, K. (eds.) Proc. Formal Techniques for Networked and Distributed Systems (FORTE 2008), Lecture Notes in Computer Science, vol. 5048, pp. 168–183. Springer, Berlin (2008)Google Scholar
  22. 22.
    Long, J.S.: The Workflow of Data Analysis Using Stata. CRC Press, Boca Raton (2009)Google Scholar
  23. 23.
    Mackenbach, J.P., Stirbu, L., Roskam, A.J., Schaap, M.M., Menvielle, G., Leinsalu, M., Kunst, A.E.: Socioeconomic inequalities in health in 22 European countries. N. Engl. J. Med. 358(23), 2468–2481 (2008)CrossRefGoogle Scholar
  24. 24.
    Margaria, T., Kubczak, C., Steffen, B.: Bio-jETI: a service integration, design and provisioning platform for orchestrated bioinformatics processes. BMC Bioinf. 9(4), 1614–1631 (2008)Google Scholar
  25. 25.
    McVie, S., Coxon, A.P.M., Hawkins, P., Palmer, J., Rice, R.: ESRC/SFC scoping study into quantitative methods capacity building in Scotland. Economic and Social Research Council (2008)Google Scholar
  26. 26.
    Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T., Glover, K., Pocock, M.R., Wipat, A., Li, P.: Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics 20(17), 3045–3054 (2004)CrossRefGoogle Scholar
  27. 27.
    Pautasso, C.: JOpera: an agile environment for web service composition with visual unit testing and refactoring. In: Proc. IEEE Symposium on Visual Languages and Human Centric Computing. Institution of Electrical and Electronic Engineers Press, New York (2005)Google Scholar
  28. 28.
    Qin, J., Fahringer, T. (eds.): Scientific Workflows—Programming, Optimization, and Synthesis with ASKALON and AWDL. Springer, Berlin (2012)Google Scholar
  29. 29.
    Smith, S.N., Fisher, S.D., Heath, A.: Opportunities and challenges in the expansion of cross-national survey research. Soc. Res. Methodol. 14(6), 485–502 (2011)CrossRefGoogle Scholar
  30. 30.
    Steffen, B., Margaria, T., Nagel, R., Jörges, S., Kubczak, C.: Model-driven development with the jABC. In: Bin, E., Ziv, A., Ur, S. (eds.) Hardware and Software. Verification and Testing, Lecture Notes in Computer Science, vol. 4383, pp. 92–108. Springer, Berlin (2007)CrossRefGoogle Scholar
  31. 31.
    Tannenbaum, T., Wright, D., Miller, K., Livny, M.: Condor: a distributed job scheduler. In: Gropp, W., Lusk, E., Sterling, T. (eds.) Beowulf Cluster Computing with Linux, pp. 307–350. MIT Press, Boston (2003)Google Scholar
  32. 32.
    Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.): Workflows for E-Science: Scientific Workflows for Grids. Springer, Berlin (2007)Google Scholar
  33. 33.
    Treiman, D.J.: Quantitative Data Analysis: Doing Social Research to test Ideas. Jossey Bass, New York (2009) Google Scholar
  34. 34.
    Turner, K.J.: Analysing interactive voice services. Comput. Netw. 45(5), 665–685 (2004)CrossRefGoogle Scholar
  35. 35.
    Turner, K.J.: Validating feature-based specifications. Softw. Pract. Exp. 36(10), 999–1027 (2006)CrossRefGoogle Scholar
  36. 36.
    Turner, K.J.: Flexible management of smart homes. Ambient Intell. Smart Environ. 3(2), 83–110 (2011)Google Scholar
  37. 37.
    Turner, K.J., Tan, K.L.L.: Rigorous development of composite grid services. Netw. Comput. Appl. 35(4), 1304–1316 (2012)CrossRefGoogle Scholar
  38. 38.
    Wassermann, B., Emmerich, W., Butchart, B., Cameron, N., Chen, L., Patel, J.: Sedna: a BPEL-based environment for visual scientific workflow modelling. In: Taylor, I.J., Deelman, E., Gannon, D.B., Shields, M. (eds.) Workflows for E-Science, pp. 428–449. Springer, Berlin (2007)CrossRefGoogle Scholar
  39. 39.
    Wirsing, M., Clark, A., Gilmore, S., Hölzl, M., Knapp, A., Koch, N., Schröder, A.: Sensoria process calculi for service-oriented computing. In: Najm, E., Pradat-Peyre, J.-F. (eds.) Proc. Formal Techniques for Networked and Distributed Systems (FORTE 2006), Lecture Notes in Computer Science, vol. 4229, pp. 24–45. Springer, Berlin (2006)Google Scholar
  40. 40.
    Yu, J., Han, J., Falcarin, P., Morisio, M.: Using temporal business rules to synthesize service composition process models. In: van Sinderen, M. (ed.) Proc. 1st Int. Workshop on Architectures, Concepts and Technologies for Service Oriented Computing, pp. 86–95. INSTICC Press, Setúbal (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Computing Science and MathematicsUniversity of StirlingStirlingScotland, UK
  2. 2.Applied Social ScienceUniversity of StirlingStirlingScotland, UK

Personalised recommendations