Conceptual Model Engineering for Industrial Safety Inspection Based on Spreadsheet Data Analysis

  • Nikita O. Dorodnykh
  • Aleksandr Yu. YurinEmail author
  • Alexey O. Shigarov
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1126)


Conceptual models are the foundation for many modern intelligent systems, as well as a theoretical basis for conducting more in-depth scientific research. Various information sources (e.g., databases, spreadsheets data, and text documents, etc.) and the reverse engineering procedure can be used for creation of such models. In this paper, we propose an approach to support the conceptual model engineering based on the analysis and transformation of tabular data from CSV files. Industrial safety inspection (ISI) reports are used as examples for spreadsheets data analysis and transformation. The automated conceptual model engineering involves five steps and employs the following software: TabbyXL for extraction of canonical (relational) tables from arbitrary spreadsheet data in the CSV format; Personal Knowledge Base Designer (PKBD) for generation of conceptual model fragments based on analysis and transformation of canonical tables, and aggregating these fragments into domain model. Verification of the approach was carried out on the corpus containing 216 spreadsheets extracted from six ISI reports. The obtained conceptual models can be used in the design of knowledge bases.


Spreadsheet data Conceptual models Class diagram UML Model transformation Industrial safety inspection 



This work was supported by the Russian Science Foundation, grant number 18-71-10001.


  1. 1.
    Berman, A.F., Nikolaichuk, O.A., Yurin, A.Y., Kuznetsov, K.A.: Support of decision-making based on a production approach in the performance of an industrial safety review. Chem. Petrol. Eng. 50(11–12), 730–738 (2015). Scholar
  2. 2.
    Yurin, A.Y., Dorodnykh, N.O., Nikolaychuk, O.A., Grishenko, M.A.: Prototyping rule-based expert systems with the aid of model transformations. J. Comput. Sci. 14(5), 680–698 (2018). Scholar
  3. 3.
  4. 4.
    Shigarov, A.O., Mikhailov, A.A.: Rule-based spreadsheet data transformation from arbitrary to relational tables. Inf. Syst. 71, 123–136 (2017). Scholar
  5. 5.
    Mauro, N., Esposito, F., Ferilli, S.: Finding critical cells in web tables with SRL: trying to uncover the devil’s tease. In: 12th International Conference on Document Analysis and Recognition, pp. 882–886 (2013).
  6. 6.
    Adelfio, M., Samet, H.: Schema extraction for tabular data on the web. VLDB Endowment 6(6), 421–432 (2013). Scholar
  7. 7.
    Chen, Z., Cafarella, M.: Integrating spreadsheet data via accurate and low-effort extraction. In: 20th ACM SIGKDD International Conference Knowledge Discovery and Data Mining, pp. 1126–1135 (2014).
  8. 8.
    Embley, D.W., Krishnamoorthy, M.S., Nagy, G., Seth, S.: Converting heterogeneous statistical tables on the web to searchable databases. IJDAR 19(2), 119–138 (2016). Scholar
  9. 9.
    Rastan, R., Paik, H., Shepherd, J., Haller, A.: Automated table understanding using stub patterns. In: Navathe, S.B., Wu, W., Shekhar, S., Du, X., Wang, X.S., Xiong, H. (eds.) DASFAA 2016. LNCS, vol. 9642, pp. 533–548. Springer, Cham (2016). Scholar
  10. 10.
    Goto, K., Ohta, Yu., Inakoshi, H., Yugami, N.: Extraction algorithms for hierarchical header structures from spreadsheets. In: Workshops of the EDBT/ICDT 2016 Joint Conference, vol. 1558, pp. 1–6 (2016)Google Scholar
  11. 11.
    Nagy, G., Seth, S.: Table headers: An entrance to the data mine. In: 23rd International Conference Pattern Recognition, pp. 4065–4070 (2016).
  12. 12.
    Koci, E., Thiele, M., Romero, O., Lehner, W.: A machine learning approach for layout inference in spreadsheets. In: Proceedings of 8th International Joint Conference Knowledge Discovery, Knowledge Engineering and Knowledge Management, pp. 77–88 (2016).
  13. 13.
    de Vos, M., Wielemaker, J., Rijgersberg, H., Schreiber, G., Wielinga, B., Top, J.: Combining information on structure and content to automatically annotate natural science spreadsheets. Int. J. Hum.-Comput. Stud. 130, 63–76 (2017). Scholar
  14. 14.
    Kandel, S., Paepcke, A., Hellerstein, J., Heer, J.: Wrangler: interactive visual specification of data transformation scripts. In: SIGCHI Conference on Human Factors in Computing Systems, 3363–3372 (2011).
  15. 15.
    Hung, V., Benatallah, B., Saint-Paul, R.: Spreadsheet-based complex data transformation. In: 20th ACM International Conference on Information and Knowledge Management, pp. 1749–1754 (2011).
  16. 16.
    Harris, W., Gulwani, S.: Spreadsheet table transformations from examples. ACM SIGPLAN Notices 46(6), 317–328 (2011). Scholar
  17. 17.
    Astrakhantsev, N., Turdakov, D., Vassilieva, N.: Semi-automatic data extraction from tables. In: Proceedings 15th All-Russian Conference Digital Libraries, pp. 14–20 (2013)Google Scholar
  18. 18.
    Barowy, D.W., Gulwani, S., Hart, T., Zorn, B.: FlashRelate: extracting relational data from semi-structured spreadsheets using examples. ACM SIGPLAN Notices 50(6), 218–228 (2015). Scholar
  19. 19.
    Cunha, J., Erwig, M., Mendes, M., Saraiva, J.: Model inference for spreadsheets. Autom. Softw. Eng. 23, 361–392 (2016). Scholar
  20. 20.
    Jin, Z., Anderson, M.R., Cafarella, M., Jagadish, H.V.: Foofah: Transforming data by example. In: ACM International Conference Management of Data, pp. 683–698 (2017).
  21. 21.
    Hermans, F., Pinzger, M., van Deursen, A.: Automatically extracting class diagrams from spreadsheets. In: D’Hondt, T. (ed.) ECOOP 2010. LNCS, vol. 6183, pp. 52–75. Springer, Heidelberg (2010). Scholar
  22. 22.
    Amalfitano, D., Fasolino, A.R., Tramontana, P., De Simone, V., Di Mare, G., Scala, S.: A reverse engineering process for inferring data models from spreadsheet-based information systems: an automotive industrial experience. In: Helfert, M., Holzinger, A., Belo, O., Francalanci, C. (eds.) DATA 2014. CCIS, vol. 178, pp. 136–153. Springer, Cham (2015). Scholar
  23. 23.
    Tijerino, Y.A., Embley, D.W., Lonsdale, D.W., Ding, Y., Nagy, G.: Towards ontology generation from tables. World Wide Web Internet Web Inf. Syst. 8(8), 261–285 (2005). Scholar
  24. 24.
    Yurin A.Y., Dorodnykh N.O., Nikolaychuk O.A., Berman A.F., Pavlov A.I.: ISI models, mendeley data, v1 (2019).

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Matrosov Institute for System Dynamics and Control Theory, Siberian Branch of the Russian Academy of SciencesIrkutskRussia

Personalised recommendations