Skip to main content
Log in

The multi-criteria evaluation of research efforts based on ETL software: from business intelligence approach to big data and semantic approaches

  • Review Article
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Many industries and academia have devoted a lot of effort and money to creating and/or using good extract-transform-load (ETL) software suitable for their data analysis purposes since it is considered a key to their success. As a result, we find the valuable interventions of research efforts based on ETL software are divided according to well-known approaches such as Business Intelligence, Big Data, and/or Semantic. As a result, problems arise in keeping up with changes and handling the significant diversity in features across these approaches. Which results in disorientation in the finding, evaluation, and choice of an ETL for industries and academia facing their approaches needs. These problems inspire us to provide a contribution that uses the systematic-literature-review (SLR) method to collect 207 papers from three databases, namely, ScienceDirect, Springer, and IEEE, dated from 2010 to 2022, grouped based on both ETL approaches and their commonly used criteria, afterwards using an existing method that automatically identifies the adequate multicriteria method for this study, which gives us the analytical-hierarchy-process method to provide the best research paper according to the requirements of scientific literature. The result implies the great significance of this study in multiple ways, providing a global idea of research papers about ETL approaches, allowing customers to eliminate uncertainty from selecting an ETL according to their specific approach needs, preferences, and interests, and also enabling future researchers and developers of ETL to decide when to focus and how to make innovative contributions to fill gaps in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. https://www.forrester.com/report/Topic-Overview-Business-Intelligence/RES39218%23.

  2. http://www.gartner.com/it-glossary/big-data/

  3. www.mcda.it/.

  4. https://www.sciencedirect.com/search.

  5. https://link.springer.com/search?query=&facet-content-type=%22Article%22.

  6. https://ieeexplore.ieee.org/Xplore/home.jsp.

  7. https://www.mendeley.com/.

Abbreviations

\({A}_{i}\) :

Alternatives

\({B}_{{P}_{j}}\) :

Pairwise comparison matrix of the elements \({P}_{j}\) to compare for each level of hierarchy

\({B}_{{\text{AHP}}}\) :

Best alternative according to the method AHP

\({C}_{k}\) :

Criteria

\({C}_{kj}\) :

Sub-criteria

\(w\left({C}_{j}\right)\) :

Weights of criteria

\(w\left({C}_{kj}\right)\) :

Weights of sub-criteria with respect to kth criterion

\({w\left({C}_{k}\right)}_{{A}_{j}}\) :

Weights of criterion \({C}_{k}\) in each alternative \({A}_{j}\)

\({w\left({C}_{kl}\right)}_{{A}_{j}}\) :

Weights of sub-criterion \({C}_{kl}\)in each alternative \({A}_{j}\)

\(w\left({C}_{k},{C}_{k1}\right)\) :

Global Weights of sub-criteria with respect to kth criterion

\(w\left({A}_{i}\right)\) :

Weights of alternatives

\(\alpha\) :

Problem of choice

\(\beta\) :

Problem of sorting

\(\gamma\) :

Problem of ranking

\({\lambda }_{{\text{max}}}\) :

Maximum Eigen value

\({\text{AHP}}\) :

Analytical hierarchy process

\({\text{BI}}\) :

Business intelligence

\({\text{CI}}\) :

Consistency index

\({\text{CR}}\) :

Coherence ratio

\({\text{DM}}\) :

Decision maker

\({\text{ETL}}\) :

Extract transform load

\({\text{MCDA}}\) :

Multi-criteria decision analysis

\({\text{OWL}}\) :

Ontology web language

\({\text{RCI}}\) :

Random consistency index

\({\text{RDF}}\) :

Resource description framework

\({\text{SLR}}\) :

Systematic literature review

\({\text{WGMM}}\) :

Weighted geometric means method

References

  1. Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 13:377–387

    Article  Google Scholar 

  2. Inmon WH (1990) Using ORACLE to build decision support systems. (QED Information Sciences, 1990)

  3. Watson HJ, Goodhue DL, Wixom BH (2002) The benefits of data warehousing: why some organizations realize exceptional payoffs. Inf Manag 39:491–502

    Article  Google Scholar 

  4. Werner D (2015) ETL yesterday, today and tomorrow: something borrowed, something green. LinkedIn Pulse

  5. Nwokeji JC, Matovu RA (2021) Systematic literature review on big data extraction, transformation and loading (ETL). In: Intelligent computing-proceedings of the 2021 computing conference. https://doi.org/10.1007/978-3-030-80126-7_24

  6. Chen H, Chiang RHL, Storey VC (2012) Business intelligence and analytics: from big data to big impact. MIS Q 36:1165–1185

    Article  Google Scholar 

  7. Bergamaschi S, Guerra F, Orsini M, Sartori C, Vincini M (2011) A semantic approach to ETL technologies. Data Knowl Eng 70:717–731

    Article  Google Scholar 

  8. Guarda T et al. (2017) Internet of Things challenges. In: 2017 12th Iberian conference on information systems and technologies (CISTI), pp 1–4

  9. Naik U, Shivalingaiah D (2008) Comparative Study of Web 1.0, Web 2.0 and Web 3.0. In: 6th International CALIBER

  10. Aghaei S, Nematbakhsh MA, Farsani HK (2012) Evolution of the world wide web: from WEB 1.0 TO WEB 4.0. Int J Web Semant Technol 3:1–10

    Article  Google Scholar 

  11. Chakraborty J, Padki A, Bansal SK (2017) Semantic ETL-State-of-the-Art and open research challenges. In: Proceedings-IEEE 11th international conference on semantic computing, ICSC (Institute of Electrical and Electronics Engineers Inc., 2017), pp 413–418 https://doi.org/10.1109/ICSC.2017.94

  12. Haryono EM et al. (2020) Comparison of the E-LT vs ETL method in data warehouse implementation: a qualitative study. In: Proceedings - 2nd international conference on informatics, multimedia, cyber, and information system, ICIMCIS. https://doi.org/10.1109/ICIMCIS51567.2020.9354284

  13. Hanine M, Boutkhoum O, Tikniouine A, Agouti T (2016) Application of an integrated multi-criteria decision making AHP-TOPSIS methodology for ETL software selection. Springerplus 5

  14. Vassiliadis P, Simitsis A, Georgantas P, Terrovitis M, Skiadopoulos S (2005) A generic and customizable framework for the design of ETL scenarios. Inf Syst 30:492–525

    Article  Google Scholar 

  15. Langseth J, Vivatrat N (2003) Why proactive business intelligence is a hallmark of the real-time enterprise: outward bound. Intell Enterp 5:34–41

    Google Scholar 

  16. Negash S, Gray P (2003) Business intelligence. Commun Assoc Inf Sys 13:15

    Google Scholar 

  17. Yeh PZ, Puri CA (2010) An efficient and robust approach for discovering data quality rules. In: Proceedings-international conference on tools with artificial intelligence, ICTAI

  18. Beyer MA, Laney D (2012) The importance of ‘big data’: a definition. Stamford, CT: Gartner

  19. Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5:199–220

    Article  Google Scholar 

  20. Gruber TR, Olsen GR (1994) An ontology for engineering mathematics. Princ Knowl Represent Reason. https://doi.org/10.1016/b978-1-4832-1452-8.50120-2

    Article  Google Scholar 

  21. Boulahia C, Behja H, Louhdi MRC (2020) Towards semantic ETL for integration of textual scientific documents in a Big Data environment: a theoretical approach. In: Colloquium in information science and technology, CIST, Institute of Electrical and Electronics Engineers Inc., pp 133–138

  22. Roy B (1996) Multicriteria methodology for decision aiding, vol 12. Springer Science & Business Media

    Google Scholar 

  23. Akinnuwesi B, Uzoka F (2017) Assessment of software project proposal using analytical hierarchy process: a framework. J Res Rev Sci 4:44–55

    Article  Google Scholar 

  24. Czekster RM, Webber T, Jandrey AH, Marcon CAM (2019) Selection of enterprise resource planning software using analytic hierarchy process. Enterp Inf Syst 13:895–915

    Article  Google Scholar 

  25. Lu G, Wang H, Mao X (2010) Using ELECTRE TRI outranking method to evaluate trustworthy software. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS, vol 6407

  26. Freire DL, Frantz RZ, Roos-Frantz F (2019) Ranking enterprise application integration platforms from a performance perspective: an experience report. Softw Pract Exp 49:921–941

    Article  Google Scholar 

  27. Beecham S, Baddoo N, Hall T, Robinson H, Sharp H (2008) Motivation in software engineering: a systematic literature review. Inf Softw Technol. https://doi.org/10.1016/j.infsof.2007.09.004

    Article  Google Scholar 

  28. Wątróbski J, Jankowski J, Ziemba P, Karczmarczyk A, Zioło M (2019) Generalised framework for multi-criteria method selection. Omega 86:107–124

    Article  Google Scholar 

  29. Kitchenham B (2007) Guidelines for performing systematic literature reviews in software engineering. Technical report, Ver. 2.3 EBSE technical report, EBSE

  30. Keele S (2007) Guidelines for performing systematic literature reviews in software engineering. Technical report, Ver. 2.3 EBSE technical report, EBSE

  31. Yoon KP, Hwang CL (1995) Multiple attribute decision making: an introduction. Sage publications

    Book  Google Scholar 

  32. Saaty TL (1990) How to make a decision: the analytic hierarchy process. Eur J Oper Res 48:9–26

    Article  Google Scholar 

  33. Krejčí J, Stoklasa J (2018) Aggregation in the analytic hierarchy process: why weighted geometric mean should be used instead of weighted arithmetic mean. Expert Syst Appl 114:97–106

    Article  Google Scholar 

  34. Saaty TL (2000) Fundamentals of decision making and priority theory, 2nd edn. RWS Publications

    Google Scholar 

  35. Yu Y et al (2022) Developing an ETL tool for converting the PCORnet CDM into the OMOP CDM to facilitate the COVID-19 data integration. J Biomed Inform 127:104002

    Article  Google Scholar 

  36. Almeida JR, Coelho L, Oliveira JL (2021) BIcenter: a collaborative Web ETL solution based on a reflective software approach. SoftwareX 16:100892

    Article  Google Scholar 

  37. Silva VS, Matas L, Moreira T, Segundo WC (2022) An ETL strategy for integrating the la Referencia platform and VIVO for the Brazilian CRIS. Procedia computer science, vol 211. Elsevier, pp 111–117

    Google Scholar 

  38. Sherman R (2015) Data integration processes. In: Business intelligence guidebook, pp 301–333. https://doi.org/10.1016/b978-0-12-411461-6.00012-5

  39. Sherman R (2015) Technology & product architectures. In: Business intelligence guidebook, pp 143–169. https://doi.org/10.1016/b978-0-12-411461-6.00007-1

  40. Masseroli M (2018) Integrative bioinformatics. Encycl Bioinf Comput Biol: ABC Bioinf 1–3:1092–1098

    Google Scholar 

  41. Sulaiman NS, Yahaya JH (2013) Development of dashboard visualization for cardiovascular disease based on star scheme. Proc Technol 11:455–462

    Article  Google Scholar 

  42. Souibgui M, Atigui F, Zammali S, Cherfi S, Yahia S. Ben (2019) Data quality in ETL process: a preliminary study. Procedia computer science, vol 159. Elsevier, pp 676–687

    Google Scholar 

  43. Laraichi S, Hammani A, Bouignane A (2016) Data integration as the key to building a decision support system for groundwater management: Case of Saiss aquifers, Morocco. Groundw Sustain Dev 2–3:7–15

    Article  Google Scholar 

  44. Zhou X et al (2010) Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artif Intell Med 48:139–152

    Article  Google Scholar 

  45. Linstedt D, Olschimke M (2016) Introduction to data warehousing. Data Vault 2:1–15. https://doi.org/10.1016/b978-0-12-802510-9.00001-5

    Article  Google Scholar 

  46. Llave MR (2018) Data lakes in business intelligence: reporting from the trenches. Proc Comput Sci 138:516–524

    Article  Google Scholar 

  47. Longo A, Giacovelli S, Bochicchio MA (2014) Fact – centered ETL: a proposal for speeding business analytics up. Proc Technol 16:471–480

    Article  Google Scholar 

  48. Nadkarni P (2016) Clinical data repositories: warehouses, registries, and the use of standards. In: Clinical Research Computing, pp 173–185. https://doi.org/10.1016/b978-0-12-803130-8.00009-9

  49. Nisbet R, Miner G, Yale K (2018) Accessory tools for doing data maccessory tools for doing data miningining. Handb Stat Anal Data Min Appl. https://doi.org/10.1016/b978-0-12-416632-5.00006-2

    Article  Google Scholar 

  50. Prasser F, Spengler H, Bild R, Eicher J, Kuhn KA (2019) Privacy-enhancing ETL-processes for biomedical data. Int J Med Inform 126:72–81

    Article  Google Scholar 

  51. Boulil K, Le Ber F, Bimonte S, Grac C, Cernesson F (2014) Multidimensional modeling and analysis of large and complex watercourse data: an OLAP-based solution. Ecol Inform 24:90–106

    Article  Google Scholar 

  52. Han J, Kamber M, Pei J (2012) Introduction. Data Mining, pp 1–38. https://doi.org/10.1016/b978-0-12-381479-1.00001-0

  53. Han J, Kamber M, Pei J (2012) Data warehousing and online analytical processing. Data Min. https://doi.org/10.1016/b978-0-12-381479-1.00004-6

    Article  Google Scholar 

  54. Johnston T (2014) Bitemporal data and the Kimball data warehouse. Bitemporal Data. https://doi.org/10.1016/b978-0-12-408067-6.00018-8

    Article  Google Scholar 

  55. Khan FA et al (2017) Efficient data access and performance improvement model for virtual data warehouse. Sustain Cities Soc 35:232–240

    Article  Google Scholar 

  56. Villar A, Zarrabeitia MT, Fdez-Arroyabe P, Santurtún A (2018) Integrating and analyzing medical and environmental data using ETL and business intelligence tools. Int J Biometeorol 62:1085–1095

    Article  Google Scholar 

  57. Silveira PS, Becker K, Ruiz DD (2010) SPDW+: a seamless approach for capturing quality metrics in software development environments. Softw Qual J 18:227–268

    Article  Google Scholar 

  58. Papastefanatos G, Vassiliadis P, Simitsis A, Vassiliou Y (2012) Metrics for the prediction of evolution impact in ETL ecosystems: a case study. J Data Semant 1:75–97

    Article  Google Scholar 

  59. Fleuren LM et al (2021) The Dutch data warehouse, a multicenter and full-admission electronic health records database for critically ill COVID-19 patients. Crit Care 25:1–12

    Article  Google Scholar 

  60. Bruland P et al (2016) Common data elements for secondary use of electronic health record data for clinical trial execution and serious adverse event reporting. BMC Med Res Methodol 16:1–10

    Article  Google Scholar 

  61. Rosenkranz C, Holten R, Räkers M, Behrmann W (2017) Supporting the design of data integration requirements during the development of data warehouses: a communication theory-based Approach. Eur J Inf Syst 26:84–115

    Article  Google Scholar 

  62. Ali SMF, Wrembel R (2017) From conceptual design to performance optimization of ETL workflows: current state of research and open problems. VLDB J 26:777–801

    Article  Google Scholar 

  63. Bender B, Bertheau C, Körppen T, Lauppe H, Gronau N (2022) A proposal for future data organization in enterprise systems—an analysis of established database approaches. IseB 20:441–494

    Article  Google Scholar 

  64. Hughes G, Dobbins C (2015) The utilization of data analysis techniques in predicting student performance in massive open online courses (MOOCs). Res Pract Technol Enhanc Learn 10:1–10

    Article  Google Scholar 

  65. Petrović M et al (2017) Automating ETL processes using the domain-specific modeling approach. IseB 15:425–460

    Article  Google Scholar 

  66. Prevedello LM, Andriole KP, Hanson R, Kelly P, Khorasani R (2010) Business intelligence tools for radiology: creating a prototype model using open-source tools. J Digit Imaging 23:133–141

    Article  Google Scholar 

  67. Guo SS, Yuan ZM, Sun AB, Yue Q (2015) A new ETL approach based on data virtualization. J Comput Sci Technol 30:311–323

    Article  Google Scholar 

  68. Hartzema AG et al (2013) Managing data quality for a drug safety surveillance system. Drug Saf 36:49–58

    Article  Google Scholar 

  69. Godinho TM, Lebre R, Almeida JR, Costa C (2019) ETL framework for real-time business intelligence over medical imaging repositories. J Digit Imaging 32:870–879

    Article  Google Scholar 

  70. Chandra P, Gupta MK (2018) Comprehensive survey on data warehousing research. Int J Inf Technol (Singapore) 10:217–224

    Article  Google Scholar 

  71. Biswas N, Sarkar A, Mondal KC (2020) Efficient incremental loading in ETL processing for real-time data integration. Innov Syst Softw Eng 16:53–61

    Article  Google Scholar 

  72. Sharon JA, Juliet S (2022) Efficient business intelligence implementation: a systematic review. In: 2022 international conference on applied artificial intelligence and computing (ICAAIC), pp 144–149. https://doi.org/10.1109/ICAAIC53929.2022.9793012

  73. Tang H, Deng L, Huang Y (2022) Business intelligence system based on big data technology. In: 2022 international conference on artificial intelligence of things and crowdsensing (AIoTCs), pp 143–147. https://doi.org/10.1109/AIoTCs58181.2022.00027

  74. Vijayalakshmi M, Minu RI (2022) Incremental load processing on ETL system through cloud. In: 2022 international conference for advancement in technology (ICONAT), pp 1–4. https://doi.org/10.1109/ICONAT53423.2022.9726039

  75. Singhal B, Aggarwal A (2022) ETL, ELT and reverse ETL: a business case Study. In: 2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE), pp 1–4. https://doi.org/10.1109/ICATIECE56365.2022.10046997

  76. Zhai D, He W (2010) An application of business intelligence based on patent in data integration and analysis. In: Proceedings - 2010 International Conference on Web Information Systems and Mining, WISM 2010, vol. 2, pp 288–292

  77. Xie S, Huaichu C, Wuyue C, Zhen W (2018) Research on data integration based on kettle. In: Proceedings-9th international conference on information technology in medicine and education, ITME, Institute of Electrical and Electronics Engineers Inc., pp 948–951. https://doi.org/10.1109/ITME.2018.00211

  78. Tiwari P, Kumar S, Mishra AC, Kumar V, Terfa B (2017) Improved performance of data warehouse. In: 2017 international conference on inventive communication and computational technologies (ICICCT), IEEE, pp 94–104

  79. Sreemathy J et al. (2021) Overview of ETL tools and talend-data integration. In: 2021 7th international conference on advanced computing and communication systems, ICACCS, Institute of Electrical and Electronics Engineers Inc., pp 1650–1654. https://doi.org/10.1109/ICACCS51430.2021.9441984

  80. Saada AI, El Khayat GA, Guirguis SK (2011) Cloud computing based ETL technique using warehouse intermediate agents. In: The 2011 international conference on computer engineering & systems, IEEE, pp 301–306

  81. Sreemathy J et al. (2021) Data integration and ETL: a theoretical perspective. In: 2021 7th international conference on advanced computing and communication systems, ICACCS, Institute of Electrical and Electronics Engineers Inc., pp 1655–1660. https://doi.org/10.1109/ICACCS51430.2021.9441997

  82. Singh M, Jain SK, Panchal VK (2014) An architecture of DSP tool for publishing the heterogeneous data in dataspace. In: Proceedings - 2014 13th international conference on information technology, ICIT, Institute of Electrical and Electronics Engineers Inc., pp 209–214. https://doi.org/10.1109/ICIT.2014.23

  83. Mhon GGW, Kham NSM (2020) ETL Preprocessing with multiple data sources for academic data analysis. In: 2020 IEEE conference on computer applications (ICCA), pp 1–5

  84. Martin A, Celma M (2011) Integrating human genome variation data: an information system approach. In: Proceedings - international workshop on database and expert systems applications, DEXA, pp 65–69. https://doi.org/10.1109/DEXA.2011.45

  85. Lupa M, Sarlej W, Adamek K (2018) Harmonization of datasets in the frame of spatial data infrastructure using ETL tools: a case study of BDOT500 and BDOT10k databases. In: Proceedings - 2018 Baltic Geodetic Congress, BGC-Geomatics, Institute of Electrical and Electronics Engineers Inc., pp 217–220. https://doi.org/10.1109/BGC-Geomatics.2018.00047

  86. DrCPriya Gj, Scholar R, Supervisor R (2020) Data integration with XML ETL processing. In: 2020 international conference on computer science, engineering and applications (ICCSEA)

  87. Hajji M, Qbadou M, Mansouri K (2019) Towards the development of talend open studio components for the support of semantic sources. In: 2019 1st international conference on smart systems and data science (ICSSD), IEEE, pp 1–6

  88. Luo J, Chen Y, Zeng Q (2010) The design and implementation of electric power data integration system based on the extraction-transformation-loading technology. In: 2010 international conference on management and service science, IEEE, pp 1–4

  89. Deneke W, Li WN, Thompson C (2013) Automatic composition of ETL workflows from business intents. In: Proceedings-16th IEEE international conference on computational science and engineering, CSE, pp 1036–1042. https://doi.org/10.1109/CSE.2013.151

  90. Belo O, Cuzzocrea A, Oliveira B (2014) Modeling and supporting ETL processes via a pattern-oriented, task-reusable framework. In: Proceedings-international conference on tools with artificial intelligence, ICTAI, IEEE Computer Society, vol. 2014, pp 960–966

  91. Akbar R, Silvana M, Hersyah MH, Jannah M (2020) Implementation of business intelligence for sales data management using interactive dashboard visualization in XYZ stores. In: 2020 international conference on information technology systems and innovation, ICITSI 2020 – proceedings, Institute of Electrical and Electronics Engineers Inc., pp 242–249. https://doi.org/10.1109/ICITSI50517.2020.9264984

  92. Sreemathy J, Joseph VI, Nisha S, Prabha IC, Priya RMG (2020) Data integration in ETL using TALEND. In: 2020 6th international conference on advanced computing and communication systems (ICACCS), pp 1444–1448

  93. Balti H et al (2022) Multidimensional architecture using a massive and heterogeneous data: application to drought monitoring. Futur Gener Comput Syst 136:1–14

    Article  Google Scholar 

  94. Ngo VM, Kechadi MT (2021) Electronic farming records – a framework for normalising agronomic knowledge discovery. Comput Electron Agric 184:106074

    Article  Google Scholar 

  95. Gu R et al (2021) SparkDQ: efficient generic big data quality management on distributed data-parallel computation. J Parallel Distrib Comput 156:132–147

    Article  Google Scholar 

  96. Souibgui M, Atigui F, Ben Yahia S, Si-Said Cherfi S (2022) An embedding driven approach to automatically detect identifiers and references in document stores. Data Knowl Eng 139:102003

    Article  Google Scholar 

  97. Grzegorowski M et al (2021) Cost optimization for big data workloads based on dynamic scheduling and cluster-size tuning. Big Data Res 25:100203

    Article  Google Scholar 

  98. Mia MR, Hoque ASML, Khan SI, Ahamed SI (2022) A privacy-preserving national clinical data warehouse: architecture and analysis. Smart Health 23:100238

    Article  Google Scholar 

  99. Fernandes AX, Guimaraes P, Santos MY (2022) Big data analytics for vehicle multisensory anomalies detection. Proc Comput Sci 204:817–824

    Article  Google Scholar 

  100. Saif S, Wazir S (2018) Performance analysis of big data and cloud computing techniques: a survey. Proc Comput Sci 132:118–127

    Article  Google Scholar 

  101. Hu F et al (2018) ClimateSpark: an in-memory distributed computing framework for big climate data analytics. Comput Geosci 115:154–166

    Article  Google Scholar 

  102. Qu W, Dessloch S (2017) Distributed snapshot maintenance in wide-column NoSQL databases using partitioned incremental ETL pipelines. Inf Syst 70:48–58

    Article  Google Scholar 

  103. Marín-Ortega PM, Dmitriyev V, Abilov M, Gómez JM (2014) ELTA: new approach in designing business intelligence solutions in era of big data. Proc Technol 16:667–674

    Article  Google Scholar 

  104. Ramos TG, Machado JCF, Cordeiro BPV (2015) Primary education evaluation in Brazil using big data and cluster analysis. Proc Comput Sci 55:1031–1039

    Article  Google Scholar 

  105. Santoso LW (2017) Data warehouse with big data technology for higher education. Proc Comput Sci 124:93–99

    Article  Google Scholar 

  106. Schokker D, Athanasiadis IN, Visser B, Veerkamp RF, Kamphuis C (2020) Storing, combining and analysing turkey experimental data in the big data era. Animal 14:2397–2403

    Article  Google Scholar 

  107. Shang W, Adams B, Hassan AE (2012) Using pig as a data preparation language for large-scale mining software repositories studies: an experience report. J Syst Softw 85:2195–2204

    Article  Google Scholar 

  108. Song J et al (2015) HaoLap: a hadoop based OLAP system for big data. J Syst Softw 102:167–181

    Article  Google Scholar 

  109. Chang CH, Jiang FC, Yang CT, Chou SC (2019) On construction of a big data warehouse accessing platform for campus power usages. J Parallel Distrib Comput 133:40–50

    Article  Google Scholar 

  110. Jenhani F, Gouider MS, Said LB (2019) Streaming social media data analysis for events extraction and warehousing using hadoop and storm: drug abuse case study. Proc Comput Sci 159:1459–1467

    Article  Google Scholar 

  111. Jukic N, Jukic B, Sharma A, Nestorov S, Korallus Arnold B (2017) Expediting analytical databases with columnar approach. Decis Support Syst 95:61–81

    Article  Google Scholar 

  112. Lin HC, Kuo YC, Liu MY (2020) A health informatics transformation model based on intelligent cloud computing – exemplified by type 2 diabetes mellitus with related cardiovascular diseases. Comput Methods Programs Biomed 191:105409

    Article  Google Scholar 

  113. Mallek H, Ghozzi F, Teste O, Gargouri F (2018) BigDimETL with NoSQL database. Proc Comput Sci 126:798–807

    Article  Google Scholar 

  114. Bimonte S, Ren L, Koueya N (2020) A linear programming-based framework for handling missing data in multi-granular data warehouses. Data Knowl Eng 128:101832

    Article  Google Scholar 

  115. Fadiya SO, Saydam S, Zira VV (2014) Advancing big data for humanitarian needs. Proc Eng 78:88–95

    Article  Google Scholar 

  116. Fotache M, Strimbei C (2015) SQL and data analysis. some implications for data analysits and higher education. Proc Econ Finance 20:243–251

    Article  Google Scholar 

  117. Zdravevski E, Lameski P, Apanowicz C, Ślȩzak D (2020) From big data to business analytics: the case study of churn prediction. Appl Soft Comput J 90:106164

    Article  Google Scholar 

  118. Wang H, Mu L, Shi F, Liu K, Qian Y (2019) Management and instant query of distributed oil and gas production dynamic data. Pet Explor Dev 46:1014–1021

    Article  Google Scholar 

  119. Bala M, Boussaid O, Alimazighi Z (2017) A fine-grained distribution approach for ETL processes in big data environments. Data Knowl Eng 111:114–136

    Article  Google Scholar 

  120. Sassi MSH (2016) A new architecture for cognitive internet of things and big data. Data Vault 2.0 159:1–15

    Google Scholar 

  121. Das D, Chakraborty C, Banerjee S (2020) A framework development on big data analytics for Terahertz Healthcare. Terahertz Biomedical and Healthcare Technologies. https://doi.org/10.1016/b978-0-12-818556-8.00007-0.

  122. Golov N, Rönnbäck L (2017) Big Data normalization for massively parallel processing databases. Comput Stand Interf 54:86–93

    Article  Google Scholar 

  123. Vieira AAC, Dias LMS, Santos MY, Pereira GAB, Oliveira JA (2019) Simulation of an automotive supply chain using big data. Comput Ind Eng 137:106033

    Article  Google Scholar 

  124. Machado GV, Cunha Í, Pereira ACM, Oliveira LB (2019) DOD-ETL: distributed on-demand ETL for near real-time business intelligence. J Internet Serv Appl 10:1–15

    Article  Google Scholar 

  125. Ong TC et al (2017) Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading. BMC Med Inform Decis Mak 17:1–12

    Article  Google Scholar 

  126. Yao Q et al (2015) Design and development of a medical big data processing system based on Hadoop. J Med Syst 39:1–11

    Article  Google Scholar 

  127. Vossen G (2014) Big data as the new enabler in business and other intelligence. Vietnam J Comput Sci 1:3–14

    Article  Google Scholar 

  128. Boulekrouche B, Jabeur N, Alimazighi Z (2016) Toward integrating grid and cloud-based concepts for an enhanced deployment of spatial data warehouses in cyber-physical system applications. J Ambient Intell Humaniz Comput 7:475–487

    Article  Google Scholar 

  129. Wang H et al (2015) Efficient query processing framework for big data warehouse: an almost join-free approach. Front Comput Sci 9:224–236

    Article  MathSciNet  Google Scholar 

  130. Sebaa A, Chikh F, Nouicer A, Tari AK (2018) Medical big data warehouse: architecture and system design, a case study: improving healthcare resources distribution. J Med Syst 42:1–16

    Article  Google Scholar 

  131. Belcastro L et al (2022) Programming big data analysis: principles and solutions. J Big Data 9:1–50

    Article  Google Scholar 

  132. Fikri N, Rida M, Abghour N, Moussaid K, El Omri A (2019) An adaptive and real-time based architecture for financial data integration. J Big Data 6:1–25

    Article  Google Scholar 

  133. Masciari E (2015) An end to end framework for building data cubes over trajectory data streams. J Intell Inf Syst 45:131–164

    Article  Google Scholar 

  134. Lucero-Obusan C, Oda G, Mostaghimi A, Schirmer P, Holodniy M (2022) Public health surveillance in the U.S. department of Veterans affairs: evaluation of the Praedico surveillance system. BMC Public Health 22:272

    Article  Google Scholar 

  135. Berisha B, Mëziu E, Shabani I (2022) Big data analytics in cloud computing: an overview. J Cloud Comput 11:24

    Article  Google Scholar 

  136. Liu X, Heller A, Nielsen PS (2017) CITIESData: a smart city data management framework. Knowl Inf Syst 53:699–722

    Article  Google Scholar 

  137. Qu W, Dessloch S (2014) A real-time materialized view approach for analytic flows in hybrid cloud environments. Datenbank-Spektrum 14:97–106

    Article  Google Scholar 

  138. Lopes P, Oliveira JL (2015) An automated real-time integration and interoperability framework for bioinformatics. BMC Bioinf 16:1–13

    Article  Google Scholar 

  139. Bajaber F et al (2016) Big data 2.0 processing systems: taxonomy and open challenges. J Grid Comput 14:379–405

    Article  Google Scholar 

  140. Kathiravelu P, Sharma A, Galhardas H, Van Roy P, Veiga L (2019) On-demand big data integration: a hybrid ETL approach for reproducible scientific research. Distrib Parallel Databases 37:273–295

    Article  Google Scholar 

  141. Choi WW, Ahn JW, Shin DB (2019) Study on the development of geo-spatial big data service system based on 7V in Korea. KSCE J Civ Eng 23:388–399

    Article  Google Scholar 

  142. Cuzzocrea A, Ferreira N, Furtado P (2020) A rewrite/merge approach for supporting real-time data warehousing via lightweight data integration. J Supercomput 76:3898–3922

    Article  Google Scholar 

  143. Boulila W, Farah IR, Hussain A (2018) A novel decision support system for the interpretation of remote sensing big data. Earth Sci Inform 11:31–45

    Article  Google Scholar 

  144. Gröger C (2018) Building an industry 4.0 analytics platform. Datenbank-Spektrum 18:5–14

    Article  Google Scholar 

  145. Jemmali R, Abdelhedi F, Zurfluh G (2022) DLToDW: transferring relational and NoSQL databases from a data lake. SN Comput Sci 3:381

    Article  Google Scholar 

  146. Biswas N, Mondal AS, Kusumastuti A, Saha S, Mondal KC (2022) Automated credit assessment framework using ETL process and machine learning. Innov Syst Softw Eng. https://doi.org/10.1007/s11334-022-00522-x

    Article  Google Scholar 

  147. Martins A, Abbasi M, Martins P, Sá F (2022) BigData oriented to business decision making: a real case study in constructel. Comput Math Organ Theory 28:271–291

    Article  Google Scholar 

  148. Pallamala RK, Rodrigues P (2022) An investigative testing of structured and unstructured data formats in big data application using apache spark. Wirel Pers Commun 122:603–620

    Article  Google Scholar 

  149. Mehmood E, Anees T (2022) Distributed real-time ETL architecture for unstructured big data. Knowl Inf Syst 64:3419–3445

    Article  Google Scholar 

  150. Naeem MA, Waqar W, Mirza F, Tahir A (2022) TinyLFU-based semi-stream cache join for near-real-time data warehousing. Soft comput 26:11091–11103

    Article  Google Scholar 

  151. Sakib N, Jamil SJ, Mukta SH (2022) A novel approach on machine learning based data warehousing for intelligent healthcare services. In: 2022 IEEE Region 10 symposium (TENSYMP), pp 1–5. https://doi.org/10.1109/TENSYMP54529.2022.9864564

  152. Moura JYA, Cadersaib BZ (2022) Effort estimation method for extract transfer load (ETL) big data projects. In: 2022 2nd international conference on information technology and education (ICIT&E), pp 160–167. https://doi.org/10.1109/ICITE54466.2022.9759873

  153. Sivabalan S, Minu RI (2021) Heterogeneous data integration with ELT and analytical MPP database for data analysis application. In: 2021 innovations in power and advanced computing technologies (i-PACT), pp 1–5. https://doi.org/10.1109/i-PACT52855.2021.9696841

  154. Adnan Ilham AA, Usman S (2017) Performance analysis of extract, transform, load (ETL) in apache Hadoop atop NAS storage using ISCSI. In: 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT), pp 1–5. https://doi.org/10.1109/CAIPT.2017.8320716

  155. Zdravevski E, Lameski P, Dimitrievski A, Grzegorowski M, Apanowicz C (2019) Cluster-size optimization within a cloud-based ETL framework for Big Data. In: 2019 IEEE international conference on big data (Big Data), pp 3754–3763

  156. Widanage C et al. (2020) High performance data engineering everywhere. In: Proceedings - 2020 IEEE international conference on smart data services, SMDS, Institute of Electrical and Electronics Engineers Inc., pp 122–132. https://doi.org/10.1109/SMDS49396.2020.00022

  157. Suleykin A, Panfilov P (2020) Metadata-driven industrial-grade ETL system. In: Proceedings - 2020 IEEE international conference on big data, Big Data, Institute of Electrical and Electronics Engineers Inc., pp 2433–2442. https://doi.org/10.1109/BigData50022.2020.9378367

  158. Tesfagiorgish DG, JunYi L (2015) Big data transformation testing based on data reverse engineering. In: 2015 IEEE 12th international conference on ubiquitous intelligence and computing and 2015 IEEE 12th international conference on autonomic and trusted computing and 2015 IEEE 15th international conference on scalable computing and communications and its associated workshops (UIC-ATC-ScalCom), IEEE, pp 649–652. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.129

  159. Samarasinghe R, Perera G, Perera N, Senaratna P, Samarasingha L (2017) People clues: business intelligence tool for team dynamics. In: 2017 seventeenth international conference on advances in ICT for emerging regions (ICTer), pp 1–6

  160. Seay C, Agrawal R, Kadadi A, Barel Y (2015) Using Hadoop on the mainframe: a big solution for the challenges of big data. In: Proceedings-12th international conference on information technology: new generations, ITNG, Institute of Electrical and Electronics Engineers Inc., pp 765–769. https://doi.org/10.1109/ITNG.2015.135

  161. Muthyala R et al. (2017) Data-driven job search engine using skills and company attribute filters. In: IEEE International Conference on Data Mining Workshops, ICDMW, vol. 2017, IEEE Computer Society, pp 199–206

  162. Kim S-S, Yu S-H (2015) Architecture of geospatial big-data batch processing model based on Hadoop. In: 2015 international conference on information and communication technology convergence (ICTC), pp 964–966

  163. Adilah S et al. (2017) The challenges of extract, transform and loading (ETL) system implementation for near real-time environment. In: 2017 international conference on research and innovation in information systems (ICRIIS) pp 1–5

  164. Ma S et al. (2019) Bank big data architecture based on massive parallel processing database. In: Proceedings - 2018 15th international symposium on pervasive systems, algorithms and networks, I-SPAN, Institute of Electrical and Electronics Engineers Inc., pp 93–99. https://doi.org/10.1109/I-SPAN.2018.00024

  165. Moatti Y et al. (2017) Too big to eat: boosting analytics data ingestion from object stores with scoop. In: Proceedings - international conference on data engineering, IEEE Computer Society, pp 309–320. https://doi.org/10.1109/ICDE.2017.243

  166. Kholod II, Efimova MS (2017) Smart collection of data for financial instruments. In: 2017 XX IEEE international conference on soft computing and measurements (SCM), pp 705–708

  167. Houari ME, Rhanoui M, Asri BE (2017) Hybrid big data warehouse for On-demand decision needs. In: 2017 international conference on electrical and information technologies (ICEIT), pp 1–6

  168. Diouf PS, Boly A, Ndiaye S (2018) Variety of data in the ETL processes in the cloud: state of the art. In: International conference on innovative research and development (ICIRD), pp 1–5

  169. Diouf PS, Boly A, Ndiaye S (2017) Performance of the ETL processes in terms of volume and velocity in the cloud: state of the art. In: 2017 4th IEEE international conference on engineering technologies and applied sciences (ICETAS), pp 1–5

  170. Chou SC, Yang CT, Jiang FC, Chang CH (2018) The implementation of a data-accessing platform built from big data warehouse of electric loads. In: Proceedings - international computer software and applications conference, vol. 2, IEEE Computer Society, pp 87–92

  171. Figueiras P et al. (2017) User interface support for a big ETL data processing pipeline an application scenario on highway toll charging models. In: 2017 International conference on engineering, technology and innovation (ICE/ITMC), pp 1437–1444

  172. Xu B, Zhu S, Yu J, Li C, Sun Q (2017) Designing ETL processes to integrate multi-field digital information resources. In: 2017 2nd international conference on image, vision and computing (ICIVC), pp 1053–1057

  173. Deshpande PM, Margoor A, Venkatesh R (2018) Automatic tuning of SQL-on-Hadoop engines on cloud platforms. In: IEEE International Conference on Cloud Computing, CLOUD, vol. 2018, IEEE Computer Society, pp 508–515

  174. Bala M, Boussaid O, Alimazighi Z (2014) P-ETL: parallel-ETL based on the mapreduce paradigm. In: 2014 IEEE/ACS 11th international conference on computer systems and applications (AICCSA), pp 42–49

  175. Aluvalu R, Jabbar MA (2018) Handling data analytics on unstructured data using MongoDB. Smart Cities Symp 2018:1–5

    Google Scholar 

  176. Zeng YR, Chang YS, Fang YH (2019) Data visualization for air quality analysis on bigdata platform. In: 2019 international conference on system science and engineering (ICSSE), pp 313–317

  177. Azqueta-Alzuaz A, Patino-Martinez M, Brondino I, Jimenez-Peris R (2017) Massive data load on distributed database systems over HBase. In: Proceedings - 2017 17th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGRID, Institute of Electrical and Electronics Engineers Inc., pp 776–779. https://doi.org/10.1109/CCGRID.2017.124

  178. Mehmood E, Anees T (2020) Challenges and solutions for processing real-time big data stream: a systematic literature review. IEEE Access 8:119123–119143. https://doi.org/10.1109/ACCESS.2020.3005268

    Article  Google Scholar 

  179. Plazas JE et al (2022) Sense, transform & send for the internet of things (STS4IoT): UML profile for data-centric IoT applications. Data Knowl Eng 139:101971

    Article  Google Scholar 

  180. Sanprasit N, Jampachaisri K, Titijaroonroj T, Kesorn K (2021) Intelligent approach to automated star-schema construction using a knowledge base. Expert Syst Appl 182:115226

    Article  Google Scholar 

  181. Antunes AL, Cardoso E, Barateiro J (2022) Incorporation of ontologies in data warehouse/business intelligence systems - a systematic literature review. Int J Inf Manag Data Insights. https://doi.org/10.1016/j.jjimei.2022.100131

    Article  Google Scholar 

  182. Deb Nath RP, Hose K, Pedersen TB, Romero O (2017) SETL: a programmable semantic extract-transform-load framework for semantic data warehouses. Inf Syst 68:17–43

    Article  Google Scholar 

  183. Simitsis A, Skoutas D, Castellanos M (2010) Representation of conceptual ETL designs in natural language using Semantic Web technology. Data Knowl Eng 69:96–115

    Article  Google Scholar 

  184. Teixeira MAC, Belloze KT, Cavalcanti MC, Silva-Junior FP (2018) Data mart construction based on semantic annotation of scientific articles: a case study for the prioritization of drug targets. Comput Methods Programs Biomed 157:225–235

    Article  Google Scholar 

  185. Ta’a A, Abdullah MS (2011) Goal-ontology approach for modeling and designing ETL processes. Proc Comput Sci 3:942–948

    Article  Google Scholar 

  186. Khouri S, Berkani N, Bellatreche L (2017) Tracing data warehouse design lifecycle semantically. Comput Stand Interf 51:132–151

    Article  Google Scholar 

  187. Kang TW, Hong CH (2015) A study on software architecture for effective BIM/GIS-based facility management data integration. Autom Constr 54:25–38

    Article  Google Scholar 

  188. Kilias T, Löser A, Andritsos P (2015) INDREX: in-database relation extraction. Inf Syst 53:124–144

    Article  Google Scholar 

  189. Marco-Ruiz L, Moner D, Maldonado JA, Kolstrup N, Bellika JG (2015) Archetype-based data warehouse environment to enable the reuse of electronic health record data. Int J Med Inform 84:702–714

    Article  Google Scholar 

  190. Mendoza M, Alegría E, Maca M, Cobos C, León E (2015) Multidimensional analysis model for a document warehouse that includes textual measures. Decis Support Syst 72:44–59

    Article  Google Scholar 

  191. Selma K et al (2012) Ontology-based structured web data warehouses for sustainable interoperability: requirement modeling, design methodology and tool. Comput Ind 63:799–812

    Article  Google Scholar 

  192. Nebot V, Berlanga R (2012) Building data warehouses with semantic web data. Decis Support Syst 52:853–868

    Article  Google Scholar 

  193. Kraiem MB, Feki J, Khrouf K, Ravat F, Teste O (2015) Modeling and OLAPing social media: the case of Twitter. Soc Netw Anal Min 5:1–15

    Article  Google Scholar 

  194. Salem R, Boussaïd O, Darmont J (2013) Active XML-based Web data integration. Inf Syst Front 15:371–398

    Article  Google Scholar 

  195. Khouri S, Bellatreche L (2017) Design life-cycle-driven approach for data warehouse systems configurability. J Data Semant 6:83–111

    Article  Google Scholar 

  196. Villarroya S, Viqueira JRR, Regueiro MA, Taboada JA, Cotos JM (2016) SODA: a framework for spatial observation data analysis. Distrib Parallel Databases 34:65–99

    Article  Google Scholar 

  197. Araibi N, Ben Ahmed E, Karaa Ben Abdessalem W (2016) \(\mathcal {IRORS}\): intelligent recommendation of RSS feeds. Vietnam J Comput Sci 3:47–56

    Article  Google Scholar 

  198. Boukhari I, Jean S, Ait-Sadoune I, Bellatreche L (2018) The role of user requirements in data repository design. Int J Softw Tools Technol Transf 20:19–34

    Article  Google Scholar 

  199. Miyoshi NSB, Pinheiro DG, Silva WA, Felipe JC (2013) Computational framework to support integration of biomolecular and clinical data within a translational approach. BMC Bioinf 14:1–12

    Article  Google Scholar 

  200. Moalla I, Nabli A, Bouzguenda L, Hammami M (2017) Data warehouse design approaches from social media: review and comparison. Social Netw Anal Min. https://doi.org/10.1007/s13278-017-0423-8

    Article  Google Scholar 

  201. Xu Y et al (2019) An information integration and transmission model of multi-source data for product quality and safety. Inf Syst Front 21:191–212

    Article  Google Scholar 

  202. Sideridis S, Pelekis N, Theodoridis Y (2016) On querying and mining semantic-aware mobility timelines. Int J Data Sci Anal 2:29–44

    Article  Google Scholar 

  203. Priyatna F, Alonso-Calvo R, Paraiso-Medina S, Corcho O (2017) Querying clinical data in HL7 RIM based relational model with morph-RDB. J Biomed Semant 8:1–12

    Article  Google Scholar 

  204. Pressat-Laffouilhère T et al (2022) Evaluation of Doc’EDS: a French semantic search tool to query health documents from a clinical data warehouse. BMC Med Inform Decis Mak 22:34

    Article  Google Scholar 

  205. Haberson A, Rinner C, Schöberl A, Gall W (2019) Feasibility of mapping Austrian health claims data to the OMOP common data model. J Med Syst 43:1–5

    Article  Google Scholar 

  206. Omidvar A, Garakani M, Safarpour HR (2014) Context based user ranking in forums for expert finding using WordNet dictionary and social network analysis. Inf Technol Manag 15:51–63

    Article  Google Scholar 

  207. Geibel P et al (2015) Ontology-based information extraction: identifying eligible patients for clinical trials in neurology. J Data Semant 4:133–147

    Article  Google Scholar 

  208. Carrasco RA, Muñoz-Leiva F, Hornos MJ (2013) A multidimensional data model using the fuzzy model based on the semantic translation. Inf Syst Front 15:351–370

    Article  Google Scholar 

  209. Girardi D, Dirnberger J, Giretzlehner M (2015) An ontology-based clinical data warehouse for scientific research. Safety in Health, vol. 1. http://www.safetyinhealth.com/content/1/1/6

  210. Berkani N, Bellatreche L, Khouri S (2013) Towards a conceptualization of ETL and physical storage of semantic data warehouses as a service. Cluster Comput 16:915–931

    Article  Google Scholar 

  211. Berkani N, Bellatreche L, Khouri S, Ordonez C (2020) The contribution of linked open data to augment a traditional data warehouse. J Intell Inf Syst 55:397–421

    Article  Google Scholar 

  212. Lopes P, Luís Oliveira J (2012) COEUS: ‘semantic web in a box’ for biomedical applications. J Biomed Semant 3:1–19

    Article  Google Scholar 

  213. Hanna J, Joseph E, Brochhausen M, Hogan WR (2013) Building a drug ontology based on RxNorm and other sources. J Biomed Semant 4:1–9

    Article  Google Scholar 

  214. del Carmen Legaz-García M, Miñarro-Giménez JA, Menárguez-Tortosa M, Fernández-Breis JT (2016) Generation of open biomedical datasets through ontology-driven transformation and integration processes. J Biomed Semant 7:1–17

    Article  Google Scholar 

  215. Essa YM, Attiya G, El-Sayed A, ElMahalawy A (2018) Data processing platforms for electronic health records. Health Technol 8:271–280

    Article  Google Scholar 

  216. Pannarale P et al (2012) GIDL: a rule based expert system for GenBank intelligent data loading into the molecular biodiversity database. BMC Bioinf 13:1–14

    Article  Google Scholar 

  217. Moalla I, Nabli A, Hammami M (2022) Data warehouse building to support opinion analysis in social media. Soc Netw Anal Min 12:123

    Article  Google Scholar 

  218. Iksan LH et al. (2021) Implementation of cloud based action recognition backend platform. In: 2021 international conference on artificial intelligence and mechatronics systems (AIMS), pp 1–6. https://doi.org/10.1109/AIMS52415.2021.9466068

  219. El Hafyani H, Abboud M, Taher Y (2021) A microservices based architecture for implementing and automating ETL data pipelines for mobile crowdsensing applications. In: 2021 IEEE international conference on big data (Big Data), pp 5909–5911. https://doi.org/10.1109/BigData52589.2021.9671382

  220. Milev I, Zajc M (2022) Tangible information for active consumers: data from smart home device and smart meter become customer newsletters. In: 2022 30th telecommunications forum (TELFOR), pp 1–4. https://doi.org/10.1109/TELFOR56187.2022.9983708

  221. Catovic A, Kadusic E, Ruland C, Zivic N, Hadzajlic N (2022) Air pollution prediction and warning system using IoT and machine learning. In: 2022 international conference on electrical, computer, communications and mechatronics engineering (ICECCME), pp 1–4. https://doi.org/10.1109/ICECCME55909.2022.9987957

  222. Younes AB, Ayed LB, Najjar M (2022) Intelligent assistance with ML in data mapping ETL processing. In: 2022 IEEE Information Technologies & Smart Industrial Systems (ITSIS), pp 1–4. https://doi.org/10.1109/ITSIS56166.2022.10118369

  223. Valtolina S, Ferrari L, Mesiti M (2019) Ontology-based consistent specification of sensor data acquisition plans in cross-domain iot platforms. IEEE Access 7:176141–176169

    Article  Google Scholar 

  224. Onal AC, Berat Sezer O, Ozbayoglu M, Dogdu E (2017) Weather data analysis and sensor fault detection using an extended IoT framework with semantics, big data, and machine learning. In: 2017 IEEE international conference on big data (Big Data), pp 2037–2046

  225. Sutheparaks U, Vatanawood W, Patanothai C (2011) Defining global schema for ETL of human resource performance appraisal system using REA ontology. In: 011 eighth international joint conference on computer science and software engineering (JCSSE), IEEE, pp 275–280

  226. Lee S, Park BH, Lim SH, Shankar M (2015) Table2Graph: a scalable graph construction from relational tables using map-reduce. In: Proceedings - 2015 IEEE 1st international conference on big data computing service and applications, BigDataService, Institute of Electrical and Electronics Engineers Inc., pp 294–301. https://doi.org/10.1109/BigDataService.2015.52

  227. Nebot V, Berlanga R (2010) Populating data warehouses with semantic data. IEEE Lat Am Trans 8:150–157

    Article  Google Scholar 

  228. Marx E, Shekarpour S, Auer S, Ngomo ACN (2013) Large-scale RDF dataset slicing. In: Proceedings - 2013 IEEE 7th international conference on semantic computing, ICSC, pp 228–235. https://doi.org/10.1109/ICSC.2013.47

  229. McCarthy S, McCarren A, Roantree M (2019) A method for automated transformation and validation of online datasets. In: Proceedings - 2019 IEEE 23rd international enterprise distributed object computing conference, EDOC, Institute of Electrical and Electronics Engineers Inc., pp 183–189. https://doi.org/10.1109/EDOC.2019.00030

  230. Jiang L, Cai H, Xu B (2010) A domain ontology approach in the ETL process of data warehousing. Proc- IEEE Int Conf E-Business Eng, ICEBE 2010:30–35. https://doi.org/10.1109/ICEBE.2010.36

    Article  Google Scholar 

  231. Huang OR, Ou YL, Zhang MH, Zhang C (2012) Application of ontology-based automatic ETL in marine data integration. IEEE symposium on electrical & electronics engineering

  232. Chang YS, Lin KM, Tsai YT, Zeng YR, Hun CX (2018) Big data platform for air quality analysis and prediction. In: 2018 27th wireless and optical communication conference (WOCC), pp 1–3

  233. Berkani N, Bellatreche L, Ordonez C (2018) ETL-aware materialized view selection in semantic data stream warehouses. In: 2018 12th international conference on research challenges in information science (RCIS), pp 1–11

  234. Abelló A et al (2015) Using semantic web technologies for exploratory OLAP: a survey. IEEE Trans Knowl Data Eng 27:571–588

    Article  Google Scholar 

  235. Gollapudi S (2015) Aggregating financial services data without assumptions: a semantic data reference architecture. In: Proceedings of the 2015 IEEE 9th international conference on semantic computing (IEEE ICSC 2015), pp 312–315

  236. Berkani N, Khouri S, Bellatreche L (2012) Generic methodology for semantic data warehouse design: From schema definition to ETL. In: Proceedings of the 2012 4th international conference on intelligent networking and collaborative systems, INCoS, pp 404–411. https://doi.org/10.1109/iNCoS.2012.108

  237. Bansal SK (2014) Towards a semantic extract-transform-load (ETL) framework for big data integration. In: Proceedings - 2014 IEEE international congress on big data, BigData Congress 2014, Institute of Electrical and Electronics Engineers Inc., pp 522–529. https://doi.org/10.1109/BigData.Congress.2014.82

  238. Abdellaoui S, Nader F (2015) Semantic data warehouse at the heart of competitive intelligence systems: design approach. In: 2015 6th international conference on information systems and economic intelligence (SIIE), IEEE

  239. Hoppe T, Humm B, Reibold A (2018) Semantic applications: methodology, technology, corporate use. Semantic applications: methodology, technology, corporate use. https://doi.org/10.1007/978-3-662-55433-3

  240. Madsen MR (2009) The role of open source in data integration. Third nature Technology Report

Download references

Author information

Authors and Affiliations

Authors

Contributions

The study's inception and design involved input from all authors. CB and MRCL prepared the method, collected the data, and carried out the analysis. CB and HB wrote the manuscript's initial draft, and MRCL offered feedback on earlier drafts. ZB corrected, reviewed, and edited the manuscript. The final manuscript was reviewed and approved by all authors.

Corresponding author

Correspondence to Chaimae Boulahia.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Human and animal rights

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Table 9, 10 and 11.

Table 9 Pair-wise comparison matrix and weights of criteria (Level 1), sub-criteria (Level 2), and global weights of sub-criteria
Table 10 Pair-wise comparison matrix and weight of alternatives in each sub-criterion (Level 3)
Table 11 Decision matrix of the AHP method

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Boulahia, C., Behja, H., Chbihi Louhdi, M.R. et al. The multi-criteria evaluation of research efforts based on ETL software: from business intelligence approach to big data and semantic approaches. Evol. Intel. (2024). https://doi.org/10.1007/s12065-023-00899-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12065-023-00899-z

Keywords

Navigation