Advertisement

Automated Software Engineering

, Volume 23, Issue 3, pp 361–392 | Cite as

Model inference for spreadsheets

  • Jácome CunhaEmail author
  • Martin Erwig
  • Jorge Mendes
  • João Saraiva
Article

Abstract

Many errors in spreadsheet formulas can be avoided if spreadsheets are built automatically from higher-level models that can encode and enforce consistency constraints in the generated spreadsheets. Employing this strategy for legacy spreadsheets is difficult, because the model has to be reverse engineered from an existing spreadsheet and existing data must be transferred into the new model-generated spreadsheet. We have developed and implemented a technique that automatically infers relational schemas from spreadsheets. This technique uses particularities from the spreadsheet realm to create better schemas. We have evaluated this technique in two ways: first, we have demonstrated its applicability by using it on a set of real-world spreadsheets. Second, we have run an empirical study with users. The study has shown that the results produced by our technique are comparable to the ones developed by experts starting from the same (legacy) spreadsheet data. Although relational schemas are very useful to model data, they do not fit spreadsheets well, as they do not allow expressing layout. Thus, we have also introduced a mapping between relational schemas and ClassSheets. A ClassSheet controls further changes to the spreadsheet and safeguards it against a large class of formula errors. The developed tool is a contribution to spreadsheet (reverse) engineering, because it fills an important gap and allows a promising design method (ClassSheets) to be applied to a huge collection of legacy spreadsheets with minimal effort.

Keywords

Spreadsheets ClassSheets Relational model Automatic model inference Empirical validation 

Notes

Acknowledgments

We would like to thank Orlando Belo for his help on running and analyzing the empirical study. We would also like to thank Paulo Azevedo for his help in conducting the statistical analysis of our empirical study. We would also like to thank the anonymous reviewers for their suggestions which helped us to improve the paper. This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundação para a Ciência e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124- FEDER-010048. The first author was also supported by FCT grant SFRH/BPD/73358/2010.

References

  1. Abraham, R., Erwig, M.: Inferring templates from spreadsheets. In: ICSE ’06 Proceedings of the 28th International Conference on Software Engineering, pp. 182–191. ACM, New York, NY, USA (2006)Google Scholar
  2. Abraham, R., Erwig, M., Kollmansberger, S., Seifert, E.: Visual specifications of correct spreadsheets. In: Proceedings of the 2005 IEEE Symposium on Visual Languages and Human-Centric Computing (VLHCC ’05), IEEE Computer Society, pp. 189–196 (2005)Google Scholar
  3. Alhajj, R.: Extracting the extended entity-relationship model from a legacy relational database. Inf. Syst. 28(6), 597–618 (2003)CrossRefzbMATHGoogle Scholar
  4. Cheng, T., Rival, X.: An abstract domain to infer types over zones in spreadsheets. In: Miné, A., Schmidt, D. (eds.) Static Analysis. Lecture Notes in Computer Science, vol. 7460, pp. 94–110. Springer, Berlin (2012)CrossRefGoogle Scholar
  5. Codd, E.F.: A relational model of data for large shared data banks. Commun ACM 13(6), 377–387 (1970)CrossRefzbMATHGoogle Scholar
  6. Connolly, T., Begg, C.: Database Systems, A Practical Approach to Design, Implementation, and Management, 3rd edn. Addison-Wesley, Harlow (2002)zbMATHGoogle Scholar
  7. Cunha, J.: Model-based spreadsheet engineering. PhD thesis, Universidade do Minho (2011)Google Scholar
  8. Cunha, J., Saraiva, J., Visser, J.: Discovery-based edit assistance for spreadsheets. In: 2009 IEEE Symposium on Visual Languages and Human-Centric Computing, IEEE, pp. 233–237 (2009a)Google Scholar
  9. Cunha, J., Saraiva, J., Visser, J.: From spreadsheets to relational databases and back. In: PEPM ’09 Proceedings of the 2009 ACM SIGPLAN workshop on Partial evaluation and program manipulation. ACM, New York, NY, USA, pp. 179–188 (2009b)Google Scholar
  10. Cunha, J., Erwig, M., Saraiva, J. Automatically inferring ClassSheet models from spreadsheets. In: Proceedings of the 2010 IEEE Symposium on Visual Languages and Human-Centric Computing, IEEE Computer Society, VLHCC ’10, pp. 93–100 (2010)Google Scholar
  11. Cunha J, Mendes, J., Fernandes, J.P., Saraiva, J.: Embedding and evolution of spreadsheet models in spreadsheet systems. In: IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC’11), IEEE Computer Society, pp. 186–201 (2011a)Google Scholar
  12. Cunha, J., Visser, J., Alves, T., Saraiva, J.: Type-safe evolution of spreadsheets. In: Giannakopoulou, D., Orejas, F. (eds.) Fundamental Approaches to Software Engineering. Lecture Notes in Computer Science, vol. 6603, pp. 186–201. Springer, Berlin (2011b)CrossRefGoogle Scholar
  13. Cunha, J., Fernandes, J.P., Mendes, J., Pacheco, H., Saraiva, J.: Bidirectional transformation of model-driven spreadsheets. In: Hu, Z., de Lara, J. (eds.) Theory and Practice of Model Transformations. Lecture Notes in Computer Science, vol. 7307, pp. 105–120. Springer, Berlin (2012a)CrossRefGoogle Scholar
  14. Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: MDSheet: a framework for model-driven Spreadsheet Engineering. In: Proceedings of the 34rd ACM International Conference on Software Engineering (ICSE’12), pp. 1412–1415 (2012b)Google Scholar
  15. Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: Towards an evaluation of bidirectional model-driven spreadsheets. User evaluation for Software Engineering Researchers, ACM Digital Library, USER’ 12, 25–28 (2012c)Google Scholar
  16. Cunha, J., Fernandes, J.P., Saraiva, J.: From Relational ClassSheets to UML+OCL. The Software Engineering Track at the 27th Annual ACM Symposium On Applied Computing (SAC 2012), Riva del Garda (Trento), pp. 1151–1158. ACM, Italy (2012d)Google Scholar
  17. Cunha, J., Mendes, J., Saraiva, J., Visser, J.: Model-based programming environments for spreadsheets. Sci. Comput. Program. (2014a). doi: 10.1016/j.scico.2014.02.002
  18. Cunha, J., Fernandes, J.P., Mendes, J., Saraiva, J.: Embedding, evolution, and validation of spreadsheet models in spreadsheet systems. Tech. Rep. TR-HASLab:01:2014. HASLab—High-Assurance Software Laboratory & Universidade do Minho (2014b). http://alfa.di.uminho.pt/~jacome/down/tr_embedding.pdf
  19. Engels, G., Erwig, M.: ClassSheets: automatic generation of spreadsheet applications from object-oriented specifications. In: Redmiles, D., Ellman, T., Zisman, A. (eds.) 20th IEEE/ACM International Conference on Automated Software Engineering, Long Beach, USA, pp. 124–133. ACM (2005)Google Scholar
  20. Erwig, M.: Software engineering for spreadsheets. IEEE Softw. 29(5), 25–30 (2009)CrossRefGoogle Scholar
  21. EuSpRIG. European Spreadsheet Risks Interest Group (2010) http://www.eusprig.org/
  22. Hermans, F., Pinzger, M., van Deursen, A.: Automatically extracting class diagrams from spreadsheets. In: Proceedings of the 24th European conference on Object-oriented programming, Springer-Verlag, Berlin, Heidelberg, ECOOP’10, pp. 52–75 (2010)Google Scholar
  23. Höst, M., Regnell, B., Wohlin, C.: Using students as subjects–a comparative study of students and professionals in lead-time impact assessment. Empir. Softw. Eng. 5(3), 201–214 (2000). doi: 10.1023/A:1026586415054 CrossRefzbMATHGoogle Scholar
  24. Isakowitz, T., Schocken, S., Lucas Jr, H.C.: Toward a logical/physical theory of spreadsheet modeling. ACM Trans. Inf. Syst. 13(1), 1–37 (1995)CrossRefGoogle Scholar
  25. Lämmel, R., Visser, J. : A Strafunski application letter. In: Dahl, V., Wadler, P. (eds) Proceedings of Practical Aspects of Declarative Programming (PADL’03), LNCS, vol. 2562, pp. 357–375. Springer (2003)Google Scholar
  26. Maier, D.: The Theory of Relational Databases. Computer Science Press, Rockville (1983)zbMATHGoogle Scholar
  27. Novelli, N., Cicchetti, R.: Fun: an efficient algorithm for mining functional and embedded dependencies. In: ICDT ’01: Proceedings of the 8th International Conference on Database Theory, pp. 189–203. Springer-Verlag, London, UK (2001)Google Scholar
  28. Panko, R.: Spreadsheet errors: what we know. what we think we can do. In: Proceedings of the European Spreadsheet Risks Interest Group Annual Conference (EuSpRIG), pp. 7–17 (2000)Google Scholar
  29. Perry, D.E., Porter, A.A., Votta, L.G.: Empirical studies of software engineering: a roadmap. In: ICSE ’00 Proceedings of the Conference on The Future of Software Engineering, pp. 345–355. ACM, New York, NY, USA (2000)Google Scholar
  30. Powell, S.G., Baker, K.R.: The Art of Modeling with Spreadsheets. Wiley, New York (2003)Google Scholar
  31. Rajalingham, K., Chadwick, D., Knight, B.: Classification of spreadsheet errors. In: Proceedings of the European Spreadsheet Risks Interest Group Annual Conference (EuSpRIG), pp. 23–34 (2001)Google Scholar
  32. Ullman, J.D., Widom, J.: A First Course in Database Systems. Prentice Hall, Upper Saddle River (1997)Google Scholar
  33. Visser, E.: A survey of strategies in rule-based program transformation systems. J. Symb. Comput. 40, 831–873 (2005)MathSciNetCrossRefzbMATHGoogle Scholar
  34. Visser, J., Saraiva, J.: Tutorial on strategic programming across programming paradigms. In: 8th Brazilian Symposium on Programming Languages, Niteroi, Brazil (2004)Google Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  • Jácome Cunha
    • 1
    Email author
  • Martin Erwig
    • 2
  • Jorge Mendes
    • 3
  • João Saraiva
    • 3
  1. 1.Department of InformaticsHASLab/INESC Tec & Universidade Nova de LisboaLisbonPortugal
  2. 2.School of EECSOregon State UniversityCorvallisUSA
  3. 3.Department of InformaticsHASLab/INESC Tec & Universidade do MinhoBragaPortugal

Personalised recommendations