Abstract
Many industries and academia have devoted a lot of effort and money to creating and/or using good extract-transform-load (ETL) software suitable for their data analysis purposes since it is considered a key to their success. As a result, we find the valuable interventions of research efforts based on ETL software are divided according to well-known approaches such as Business Intelligence, Big Data, and/or Semantic. As a result, problems arise in keeping up with changes and handling the significant diversity in features across these approaches. Which results in disorientation in the finding, evaluation, and choice of an ETL for industries and academia facing their approaches needs. These problems inspire us to provide a contribution that uses the systematic-literature-review (SLR) method to collect 207 papers from three databases, namely, ScienceDirect, Springer, and IEEE, dated from 2010 to 2022, grouped based on both ETL approaches and their commonly used criteria, afterwards using an existing method that automatically identifies the adequate multicriteria method for this study, which gives us the analytical-hierarchy-process method to provide the best research paper according to the requirements of scientific literature. The result implies the great significance of this study in multiple ways, providing a global idea of research papers about ETL approaches, allowing customers to eliminate uncertainty from selecting an ETL according to their specific approach needs, preferences, and interests, and also enabling future researchers and developers of ETL to decide when to focus and how to make innovative contributions to fill gaps in the literature.
Similar content being viewed by others
Notes
Abbreviations
- \({A}_{i}\) :
-
Alternatives
- \({B}_{{P}_{j}}\) :
-
Pairwise comparison matrix of the elements \({P}_{j}\) to compare for each level of hierarchy
- \({B}_{{\text{AHP}}}\) :
-
Best alternative according to the method AHP
- \({C}_{k}\) :
-
Criteria
- \({C}_{kj}\) :
-
Sub-criteria
- \(w\left({C}_{j}\right)\) :
-
Weights of criteria
- \(w\left({C}_{kj}\right)\) :
-
Weights of sub-criteria with respect to kth criterion
- \({w\left({C}_{k}\right)}_{{A}_{j}}\) :
-
Weights of criterion \({C}_{k}\) in each alternative \({A}_{j}\)
- \({w\left({C}_{kl}\right)}_{{A}_{j}}\) :
-
Weights of sub-criterion \({C}_{kl}\)in each alternative \({A}_{j}\)
- \(w\left({C}_{k},{C}_{k1}\right)\) :
-
Global Weights of sub-criteria with respect to kth criterion
- \(w\left({A}_{i}\right)\) :
-
Weights of alternatives
- \(\alpha\) :
-
Problem of choice
- \(\beta\) :
-
Problem of sorting
- \(\gamma\) :
-
Problem of ranking
- \({\lambda }_{{\text{max}}}\) :
-
Maximum Eigen value
- \({\text{AHP}}\) :
-
Analytical hierarchy process
- \({\text{BI}}\) :
-
Business intelligence
- \({\text{CI}}\) :
-
Consistency index
- \({\text{CR}}\) :
-
Coherence ratio
- \({\text{DM}}\) :
-
Decision maker
- \({\text{ETL}}\) :
-
Extract transform load
- \({\text{MCDA}}\) :
-
Multi-criteria decision analysis
- \({\text{OWL}}\) :
-
Ontology web language
- \({\text{RCI}}\) :
-
Random consistency index
- \({\text{RDF}}\) :
-
Resource description framework
- \({\text{SLR}}\) :
-
Systematic literature review
- \({\text{WGMM}}\) :
-
Weighted geometric means method
References
Codd EF (1970) A relational model of data for large shared data banks. Commun ACM 13:377–387
Inmon WH (1990) Using ORACLE to build decision support systems. (QED Information Sciences, 1990)
Watson HJ, Goodhue DL, Wixom BH (2002) The benefits of data warehousing: why some organizations realize exceptional payoffs. Inf Manag 39:491–502
Werner D (2015) ETL yesterday, today and tomorrow: something borrowed, something green. LinkedIn Pulse
Nwokeji JC, Matovu RA (2021) Systematic literature review on big data extraction, transformation and loading (ETL). In: Intelligent computing-proceedings of the 2021 computing conference. https://doi.org/10.1007/978-3-030-80126-7_24
Chen H, Chiang RHL, Storey VC (2012) Business intelligence and analytics: from big data to big impact. MIS Q 36:1165–1185
Bergamaschi S, Guerra F, Orsini M, Sartori C, Vincini M (2011) A semantic approach to ETL technologies. Data Knowl Eng 70:717–731
Guarda T et al. (2017) Internet of Things challenges. In: 2017 12th Iberian conference on information systems and technologies (CISTI), pp 1–4
Naik U, Shivalingaiah D (2008) Comparative Study of Web 1.0, Web 2.0 and Web 3.0. In: 6th International CALIBER
Aghaei S, Nematbakhsh MA, Farsani HK (2012) Evolution of the world wide web: from WEB 1.0 TO WEB 4.0. Int J Web Semant Technol 3:1–10
Chakraborty J, Padki A, Bansal SK (2017) Semantic ETL-State-of-the-Art and open research challenges. In: Proceedings-IEEE 11th international conference on semantic computing, ICSC (Institute of Electrical and Electronics Engineers Inc., 2017), pp 413–418 https://doi.org/10.1109/ICSC.2017.94
Haryono EM et al. (2020) Comparison of the E-LT vs ETL method in data warehouse implementation: a qualitative study. In: Proceedings - 2nd international conference on informatics, multimedia, cyber, and information system, ICIMCIS. https://doi.org/10.1109/ICIMCIS51567.2020.9354284
Hanine M, Boutkhoum O, Tikniouine A, Agouti T (2016) Application of an integrated multi-criteria decision making AHP-TOPSIS methodology for ETL software selection. Springerplus 5
Vassiliadis P, Simitsis A, Georgantas P, Terrovitis M, Skiadopoulos S (2005) A generic and customizable framework for the design of ETL scenarios. Inf Syst 30:492–525
Langseth J, Vivatrat N (2003) Why proactive business intelligence is a hallmark of the real-time enterprise: outward bound. Intell Enterp 5:34–41
Negash S, Gray P (2003) Business intelligence. Commun Assoc Inf Sys 13:15
Yeh PZ, Puri CA (2010) An efficient and robust approach for discovering data quality rules. In: Proceedings-international conference on tools with artificial intelligence, ICTAI
Beyer MA, Laney D (2012) The importance of ‘big data’: a definition. Stamford, CT: Gartner
Gruber TR (1993) A translation approach to portable ontology specifications. Knowl Acquis 5:199–220
Gruber TR, Olsen GR (1994) An ontology for engineering mathematics. Princ Knowl Represent Reason. https://doi.org/10.1016/b978-1-4832-1452-8.50120-2
Boulahia C, Behja H, Louhdi MRC (2020) Towards semantic ETL for integration of textual scientific documents in a Big Data environment: a theoretical approach. In: Colloquium in information science and technology, CIST, Institute of Electrical and Electronics Engineers Inc., pp 133–138
Roy B (1996) Multicriteria methodology for decision aiding, vol 12. Springer Science & Business Media
Akinnuwesi B, Uzoka F (2017) Assessment of software project proposal using analytical hierarchy process: a framework. J Res Rev Sci 4:44–55
Czekster RM, Webber T, Jandrey AH, Marcon CAM (2019) Selection of enterprise resource planning software using analytic hierarchy process. Enterp Inf Syst 13:895–915
Lu G, Wang H, Mao X (2010) Using ELECTRE TRI outranking method to evaluate trustworthy software. In: Lecture notes in computer science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS, vol 6407
Freire DL, Frantz RZ, Roos-Frantz F (2019) Ranking enterprise application integration platforms from a performance perspective: an experience report. Softw Pract Exp 49:921–941
Beecham S, Baddoo N, Hall T, Robinson H, Sharp H (2008) Motivation in software engineering: a systematic literature review. Inf Softw Technol. https://doi.org/10.1016/j.infsof.2007.09.004
Wątróbski J, Jankowski J, Ziemba P, Karczmarczyk A, Zioło M (2019) Generalised framework for multi-criteria method selection. Omega 86:107–124
Kitchenham B (2007) Guidelines for performing systematic literature reviews in software engineering. Technical report, Ver. 2.3 EBSE technical report, EBSE
Keele S (2007) Guidelines for performing systematic literature reviews in software engineering. Technical report, Ver. 2.3 EBSE technical report, EBSE
Yoon KP, Hwang CL (1995) Multiple attribute decision making: an introduction. Sage publications
Saaty TL (1990) How to make a decision: the analytic hierarchy process. Eur J Oper Res 48:9–26
Krejčí J, Stoklasa J (2018) Aggregation in the analytic hierarchy process: why weighted geometric mean should be used instead of weighted arithmetic mean. Expert Syst Appl 114:97–106
Saaty TL (2000) Fundamentals of decision making and priority theory, 2nd edn. RWS Publications
Yu Y et al (2022) Developing an ETL tool for converting the PCORnet CDM into the OMOP CDM to facilitate the COVID-19 data integration. J Biomed Inform 127:104002
Almeida JR, Coelho L, Oliveira JL (2021) BIcenter: a collaborative Web ETL solution based on a reflective software approach. SoftwareX 16:100892
Silva VS, Matas L, Moreira T, Segundo WC (2022) An ETL strategy for integrating the la Referencia platform and VIVO for the Brazilian CRIS. Procedia computer science, vol 211. Elsevier, pp 111–117
Sherman R (2015) Data integration processes. In: Business intelligence guidebook, pp 301–333. https://doi.org/10.1016/b978-0-12-411461-6.00012-5
Sherman R (2015) Technology & product architectures. In: Business intelligence guidebook, pp 143–169. https://doi.org/10.1016/b978-0-12-411461-6.00007-1
Masseroli M (2018) Integrative bioinformatics. Encycl Bioinf Comput Biol: ABC Bioinf 1–3:1092–1098
Sulaiman NS, Yahaya JH (2013) Development of dashboard visualization for cardiovascular disease based on star scheme. Proc Technol 11:455–462
Souibgui M, Atigui F, Zammali S, Cherfi S, Yahia S. Ben (2019) Data quality in ETL process: a preliminary study. Procedia computer science, vol 159. Elsevier, pp 676–687
Laraichi S, Hammani A, Bouignane A (2016) Data integration as the key to building a decision support system for groundwater management: Case of Saiss aquifers, Morocco. Groundw Sustain Dev 2–3:7–15
Zhou X et al (2010) Development of traditional Chinese medicine clinical data warehouse for medical knowledge discovery and decision support. Artif Intell Med 48:139–152
Linstedt D, Olschimke M (2016) Introduction to data warehousing. Data Vault 2:1–15. https://doi.org/10.1016/b978-0-12-802510-9.00001-5
Llave MR (2018) Data lakes in business intelligence: reporting from the trenches. Proc Comput Sci 138:516–524
Longo A, Giacovelli S, Bochicchio MA (2014) Fact – centered ETL: a proposal for speeding business analytics up. Proc Technol 16:471–480
Nadkarni P (2016) Clinical data repositories: warehouses, registries, and the use of standards. In: Clinical Research Computing, pp 173–185. https://doi.org/10.1016/b978-0-12-803130-8.00009-9
Nisbet R, Miner G, Yale K (2018) Accessory tools for doing data maccessory tools for doing data miningining. Handb Stat Anal Data Min Appl. https://doi.org/10.1016/b978-0-12-416632-5.00006-2
Prasser F, Spengler H, Bild R, Eicher J, Kuhn KA (2019) Privacy-enhancing ETL-processes for biomedical data. Int J Med Inform 126:72–81
Boulil K, Le Ber F, Bimonte S, Grac C, Cernesson F (2014) Multidimensional modeling and analysis of large and complex watercourse data: an OLAP-based solution. Ecol Inform 24:90–106
Han J, Kamber M, Pei J (2012) Introduction. Data Mining, pp 1–38. https://doi.org/10.1016/b978-0-12-381479-1.00001-0
Han J, Kamber M, Pei J (2012) Data warehousing and online analytical processing. Data Min. https://doi.org/10.1016/b978-0-12-381479-1.00004-6
Johnston T (2014) Bitemporal data and the Kimball data warehouse. Bitemporal Data. https://doi.org/10.1016/b978-0-12-408067-6.00018-8
Khan FA et al (2017) Efficient data access and performance improvement model for virtual data warehouse. Sustain Cities Soc 35:232–240
Villar A, Zarrabeitia MT, Fdez-Arroyabe P, Santurtún A (2018) Integrating and analyzing medical and environmental data using ETL and business intelligence tools. Int J Biometeorol 62:1085–1095
Silveira PS, Becker K, Ruiz DD (2010) SPDW+: a seamless approach for capturing quality metrics in software development environments. Softw Qual J 18:227–268
Papastefanatos G, Vassiliadis P, Simitsis A, Vassiliou Y (2012) Metrics for the prediction of evolution impact in ETL ecosystems: a case study. J Data Semant 1:75–97
Fleuren LM et al (2021) The Dutch data warehouse, a multicenter and full-admission electronic health records database for critically ill COVID-19 patients. Crit Care 25:1–12
Bruland P et al (2016) Common data elements for secondary use of electronic health record data for clinical trial execution and serious adverse event reporting. BMC Med Res Methodol 16:1–10
Rosenkranz C, Holten R, Räkers M, Behrmann W (2017) Supporting the design of data integration requirements during the development of data warehouses: a communication theory-based Approach. Eur J Inf Syst 26:84–115
Ali SMF, Wrembel R (2017) From conceptual design to performance optimization of ETL workflows: current state of research and open problems. VLDB J 26:777–801
Bender B, Bertheau C, Körppen T, Lauppe H, Gronau N (2022) A proposal for future data organization in enterprise systems—an analysis of established database approaches. IseB 20:441–494
Hughes G, Dobbins C (2015) The utilization of data analysis techniques in predicting student performance in massive open online courses (MOOCs). Res Pract Technol Enhanc Learn 10:1–10
Petrović M et al (2017) Automating ETL processes using the domain-specific modeling approach. IseB 15:425–460
Prevedello LM, Andriole KP, Hanson R, Kelly P, Khorasani R (2010) Business intelligence tools for radiology: creating a prototype model using open-source tools. J Digit Imaging 23:133–141
Guo SS, Yuan ZM, Sun AB, Yue Q (2015) A new ETL approach based on data virtualization. J Comput Sci Technol 30:311–323
Hartzema AG et al (2013) Managing data quality for a drug safety surveillance system. Drug Saf 36:49–58
Godinho TM, Lebre R, Almeida JR, Costa C (2019) ETL framework for real-time business intelligence over medical imaging repositories. J Digit Imaging 32:870–879
Chandra P, Gupta MK (2018) Comprehensive survey on data warehousing research. Int J Inf Technol (Singapore) 10:217–224
Biswas N, Sarkar A, Mondal KC (2020) Efficient incremental loading in ETL processing for real-time data integration. Innov Syst Softw Eng 16:53–61
Sharon JA, Juliet S (2022) Efficient business intelligence implementation: a systematic review. In: 2022 international conference on applied artificial intelligence and computing (ICAAIC), pp 144–149. https://doi.org/10.1109/ICAAIC53929.2022.9793012
Tang H, Deng L, Huang Y (2022) Business intelligence system based on big data technology. In: 2022 international conference on artificial intelligence of things and crowdsensing (AIoTCs), pp 143–147. https://doi.org/10.1109/AIoTCs58181.2022.00027
Vijayalakshmi M, Minu RI (2022) Incremental load processing on ETL system through cloud. In: 2022 international conference for advancement in technology (ICONAT), pp 1–4. https://doi.org/10.1109/ICONAT53423.2022.9726039
Singhal B, Aggarwal A (2022) ETL, ELT and reverse ETL: a business case Study. In: 2022 Second International Conference on Advanced Technologies in Intelligent Control, Environment, Computing & Communication Engineering (ICATIECE), pp 1–4. https://doi.org/10.1109/ICATIECE56365.2022.10046997
Zhai D, He W (2010) An application of business intelligence based on patent in data integration and analysis. In: Proceedings - 2010 International Conference on Web Information Systems and Mining, WISM 2010, vol. 2, pp 288–292
Xie S, Huaichu C, Wuyue C, Zhen W (2018) Research on data integration based on kettle. In: Proceedings-9th international conference on information technology in medicine and education, ITME, Institute of Electrical and Electronics Engineers Inc., pp 948–951. https://doi.org/10.1109/ITME.2018.00211
Tiwari P, Kumar S, Mishra AC, Kumar V, Terfa B (2017) Improved performance of data warehouse. In: 2017 international conference on inventive communication and computational technologies (ICICCT), IEEE, pp 94–104
Sreemathy J et al. (2021) Overview of ETL tools and talend-data integration. In: 2021 7th international conference on advanced computing and communication systems, ICACCS, Institute of Electrical and Electronics Engineers Inc., pp 1650–1654. https://doi.org/10.1109/ICACCS51430.2021.9441984
Saada AI, El Khayat GA, Guirguis SK (2011) Cloud computing based ETL technique using warehouse intermediate agents. In: The 2011 international conference on computer engineering & systems, IEEE, pp 301–306
Sreemathy J et al. (2021) Data integration and ETL: a theoretical perspective. In: 2021 7th international conference on advanced computing and communication systems, ICACCS, Institute of Electrical and Electronics Engineers Inc., pp 1655–1660. https://doi.org/10.1109/ICACCS51430.2021.9441997
Singh M, Jain SK, Panchal VK (2014) An architecture of DSP tool for publishing the heterogeneous data in dataspace. In: Proceedings - 2014 13th international conference on information technology, ICIT, Institute of Electrical and Electronics Engineers Inc., pp 209–214. https://doi.org/10.1109/ICIT.2014.23
Mhon GGW, Kham NSM (2020) ETL Preprocessing with multiple data sources for academic data analysis. In: 2020 IEEE conference on computer applications (ICCA), pp 1–5
Martin A, Celma M (2011) Integrating human genome variation data: an information system approach. In: Proceedings - international workshop on database and expert systems applications, DEXA, pp 65–69. https://doi.org/10.1109/DEXA.2011.45
Lupa M, Sarlej W, Adamek K (2018) Harmonization of datasets in the frame of spatial data infrastructure using ETL tools: a case study of BDOT500 and BDOT10k databases. In: Proceedings - 2018 Baltic Geodetic Congress, BGC-Geomatics, Institute of Electrical and Electronics Engineers Inc., pp 217–220. https://doi.org/10.1109/BGC-Geomatics.2018.00047
DrCPriya Gj, Scholar R, Supervisor R (2020) Data integration with XML ETL processing. In: 2020 international conference on computer science, engineering and applications (ICCSEA)
Hajji M, Qbadou M, Mansouri K (2019) Towards the development of talend open studio components for the support of semantic sources. In: 2019 1st international conference on smart systems and data science (ICSSD), IEEE, pp 1–6
Luo J, Chen Y, Zeng Q (2010) The design and implementation of electric power data integration system based on the extraction-transformation-loading technology. In: 2010 international conference on management and service science, IEEE, pp 1–4
Deneke W, Li WN, Thompson C (2013) Automatic composition of ETL workflows from business intents. In: Proceedings-16th IEEE international conference on computational science and engineering, CSE, pp 1036–1042. https://doi.org/10.1109/CSE.2013.151
Belo O, Cuzzocrea A, Oliveira B (2014) Modeling and supporting ETL processes via a pattern-oriented, task-reusable framework. In: Proceedings-international conference on tools with artificial intelligence, ICTAI, IEEE Computer Society, vol. 2014, pp 960–966
Akbar R, Silvana M, Hersyah MH, Jannah M (2020) Implementation of business intelligence for sales data management using interactive dashboard visualization in XYZ stores. In: 2020 international conference on information technology systems and innovation, ICITSI 2020 – proceedings, Institute of Electrical and Electronics Engineers Inc., pp 242–249. https://doi.org/10.1109/ICITSI50517.2020.9264984
Sreemathy J, Joseph VI, Nisha S, Prabha IC, Priya RMG (2020) Data integration in ETL using TALEND. In: 2020 6th international conference on advanced computing and communication systems (ICACCS), pp 1444–1448
Balti H et al (2022) Multidimensional architecture using a massive and heterogeneous data: application to drought monitoring. Futur Gener Comput Syst 136:1–14
Ngo VM, Kechadi MT (2021) Electronic farming records – a framework for normalising agronomic knowledge discovery. Comput Electron Agric 184:106074
Gu R et al (2021) SparkDQ: efficient generic big data quality management on distributed data-parallel computation. J Parallel Distrib Comput 156:132–147
Souibgui M, Atigui F, Ben Yahia S, Si-Said Cherfi S (2022) An embedding driven approach to automatically detect identifiers and references in document stores. Data Knowl Eng 139:102003
Grzegorowski M et al (2021) Cost optimization for big data workloads based on dynamic scheduling and cluster-size tuning. Big Data Res 25:100203
Mia MR, Hoque ASML, Khan SI, Ahamed SI (2022) A privacy-preserving national clinical data warehouse: architecture and analysis. Smart Health 23:100238
Fernandes AX, Guimaraes P, Santos MY (2022) Big data analytics for vehicle multisensory anomalies detection. Proc Comput Sci 204:817–824
Saif S, Wazir S (2018) Performance analysis of big data and cloud computing techniques: a survey. Proc Comput Sci 132:118–127
Hu F et al (2018) ClimateSpark: an in-memory distributed computing framework for big climate data analytics. Comput Geosci 115:154–166
Qu W, Dessloch S (2017) Distributed snapshot maintenance in wide-column NoSQL databases using partitioned incremental ETL pipelines. Inf Syst 70:48–58
Marín-Ortega PM, Dmitriyev V, Abilov M, Gómez JM (2014) ELTA: new approach in designing business intelligence solutions in era of big data. Proc Technol 16:667–674
Ramos TG, Machado JCF, Cordeiro BPV (2015) Primary education evaluation in Brazil using big data and cluster analysis. Proc Comput Sci 55:1031–1039
Santoso LW (2017) Data warehouse with big data technology for higher education. Proc Comput Sci 124:93–99
Schokker D, Athanasiadis IN, Visser B, Veerkamp RF, Kamphuis C (2020) Storing, combining and analysing turkey experimental data in the big data era. Animal 14:2397–2403
Shang W, Adams B, Hassan AE (2012) Using pig as a data preparation language for large-scale mining software repositories studies: an experience report. J Syst Softw 85:2195–2204
Song J et al (2015) HaoLap: a hadoop based OLAP system for big data. J Syst Softw 102:167–181
Chang CH, Jiang FC, Yang CT, Chou SC (2019) On construction of a big data warehouse accessing platform for campus power usages. J Parallel Distrib Comput 133:40–50
Jenhani F, Gouider MS, Said LB (2019) Streaming social media data analysis for events extraction and warehousing using hadoop and storm: drug abuse case study. Proc Comput Sci 159:1459–1467
Jukic N, Jukic B, Sharma A, Nestorov S, Korallus Arnold B (2017) Expediting analytical databases with columnar approach. Decis Support Syst 95:61–81
Lin HC, Kuo YC, Liu MY (2020) A health informatics transformation model based on intelligent cloud computing – exemplified by type 2 diabetes mellitus with related cardiovascular diseases. Comput Methods Programs Biomed 191:105409
Mallek H, Ghozzi F, Teste O, Gargouri F (2018) BigDimETL with NoSQL database. Proc Comput Sci 126:798–807
Bimonte S, Ren L, Koueya N (2020) A linear programming-based framework for handling missing data in multi-granular data warehouses. Data Knowl Eng 128:101832
Fadiya SO, Saydam S, Zira VV (2014) Advancing big data for humanitarian needs. Proc Eng 78:88–95
Fotache M, Strimbei C (2015) SQL and data analysis. some implications for data analysits and higher education. Proc Econ Finance 20:243–251
Zdravevski E, Lameski P, Apanowicz C, Ślȩzak D (2020) From big data to business analytics: the case study of churn prediction. Appl Soft Comput J 90:106164
Wang H, Mu L, Shi F, Liu K, Qian Y (2019) Management and instant query of distributed oil and gas production dynamic data. Pet Explor Dev 46:1014–1021
Bala M, Boussaid O, Alimazighi Z (2017) A fine-grained distribution approach for ETL processes in big data environments. Data Knowl Eng 111:114–136
Sassi MSH (2016) A new architecture for cognitive internet of things and big data. Data Vault 2.0 159:1–15
Das D, Chakraborty C, Banerjee S (2020) A framework development on big data analytics for Terahertz Healthcare. Terahertz Biomedical and Healthcare Technologies. https://doi.org/10.1016/b978-0-12-818556-8.00007-0.
Golov N, Rönnbäck L (2017) Big Data normalization for massively parallel processing databases. Comput Stand Interf 54:86–93
Vieira AAC, Dias LMS, Santos MY, Pereira GAB, Oliveira JA (2019) Simulation of an automotive supply chain using big data. Comput Ind Eng 137:106033
Machado GV, Cunha Í, Pereira ACM, Oliveira LB (2019) DOD-ETL: distributed on-demand ETL for near real-time business intelligence. J Internet Serv Appl 10:1–15
Ong TC et al (2017) Dynamic-ETL: a hybrid approach for health data extraction, transformation and loading. BMC Med Inform Decis Mak 17:1–12
Yao Q et al (2015) Design and development of a medical big data processing system based on Hadoop. J Med Syst 39:1–11
Vossen G (2014) Big data as the new enabler in business and other intelligence. Vietnam J Comput Sci 1:3–14
Boulekrouche B, Jabeur N, Alimazighi Z (2016) Toward integrating grid and cloud-based concepts for an enhanced deployment of spatial data warehouses in cyber-physical system applications. J Ambient Intell Humaniz Comput 7:475–487
Wang H et al (2015) Efficient query processing framework for big data warehouse: an almost join-free approach. Front Comput Sci 9:224–236
Sebaa A, Chikh F, Nouicer A, Tari AK (2018) Medical big data warehouse: architecture and system design, a case study: improving healthcare resources distribution. J Med Syst 42:1–16
Belcastro L et al (2022) Programming big data analysis: principles and solutions. J Big Data 9:1–50
Fikri N, Rida M, Abghour N, Moussaid K, El Omri A (2019) An adaptive and real-time based architecture for financial data integration. J Big Data 6:1–25
Masciari E (2015) An end to end framework for building data cubes over trajectory data streams. J Intell Inf Syst 45:131–164
Lucero-Obusan C, Oda G, Mostaghimi A, Schirmer P, Holodniy M (2022) Public health surveillance in the U.S. department of Veterans affairs: evaluation of the Praedico surveillance system. BMC Public Health 22:272
Berisha B, Mëziu E, Shabani I (2022) Big data analytics in cloud computing: an overview. J Cloud Comput 11:24
Liu X, Heller A, Nielsen PS (2017) CITIESData: a smart city data management framework. Knowl Inf Syst 53:699–722
Qu W, Dessloch S (2014) A real-time materialized view approach for analytic flows in hybrid cloud environments. Datenbank-Spektrum 14:97–106
Lopes P, Oliveira JL (2015) An automated real-time integration and interoperability framework for bioinformatics. BMC Bioinf 16:1–13
Bajaber F et al (2016) Big data 2.0 processing systems: taxonomy and open challenges. J Grid Comput 14:379–405
Kathiravelu P, Sharma A, Galhardas H, Van Roy P, Veiga L (2019) On-demand big data integration: a hybrid ETL approach for reproducible scientific research. Distrib Parallel Databases 37:273–295
Choi WW, Ahn JW, Shin DB (2019) Study on the development of geo-spatial big data service system based on 7V in Korea. KSCE J Civ Eng 23:388–399
Cuzzocrea A, Ferreira N, Furtado P (2020) A rewrite/merge approach for supporting real-time data warehousing via lightweight data integration. J Supercomput 76:3898–3922
Boulila W, Farah IR, Hussain A (2018) A novel decision support system for the interpretation of remote sensing big data. Earth Sci Inform 11:31–45
Gröger C (2018) Building an industry 4.0 analytics platform. Datenbank-Spektrum 18:5–14
Jemmali R, Abdelhedi F, Zurfluh G (2022) DLToDW: transferring relational and NoSQL databases from a data lake. SN Comput Sci 3:381
Biswas N, Mondal AS, Kusumastuti A, Saha S, Mondal KC (2022) Automated credit assessment framework using ETL process and machine learning. Innov Syst Softw Eng. https://doi.org/10.1007/s11334-022-00522-x
Martins A, Abbasi M, Martins P, Sá F (2022) BigData oriented to business decision making: a real case study in constructel. Comput Math Organ Theory 28:271–291
Pallamala RK, Rodrigues P (2022) An investigative testing of structured and unstructured data formats in big data application using apache spark. Wirel Pers Commun 122:603–620
Mehmood E, Anees T (2022) Distributed real-time ETL architecture for unstructured big data. Knowl Inf Syst 64:3419–3445
Naeem MA, Waqar W, Mirza F, Tahir A (2022) TinyLFU-based semi-stream cache join for near-real-time data warehousing. Soft comput 26:11091–11103
Sakib N, Jamil SJ, Mukta SH (2022) A novel approach on machine learning based data warehousing for intelligent healthcare services. In: 2022 IEEE Region 10 symposium (TENSYMP), pp 1–5. https://doi.org/10.1109/TENSYMP54529.2022.9864564
Moura JYA, Cadersaib BZ (2022) Effort estimation method for extract transfer load (ETL) big data projects. In: 2022 2nd international conference on information technology and education (ICIT&E), pp 160–167. https://doi.org/10.1109/ICITE54466.2022.9759873
Sivabalan S, Minu RI (2021) Heterogeneous data integration with ELT and analytical MPP database for data analysis application. In: 2021 innovations in power and advanced computing technologies (i-PACT), pp 1–5. https://doi.org/10.1109/i-PACT52855.2021.9696841
Adnan Ilham AA, Usman S (2017) Performance analysis of extract, transform, load (ETL) in apache Hadoop atop NAS storage using ISCSI. In: 2017 4th International Conference on Computer Applications and Information Processing Technology (CAIPT), pp 1–5. https://doi.org/10.1109/CAIPT.2017.8320716
Zdravevski E, Lameski P, Dimitrievski A, Grzegorowski M, Apanowicz C (2019) Cluster-size optimization within a cloud-based ETL framework for Big Data. In: 2019 IEEE international conference on big data (Big Data), pp 3754–3763
Widanage C et al. (2020) High performance data engineering everywhere. In: Proceedings - 2020 IEEE international conference on smart data services, SMDS, Institute of Electrical and Electronics Engineers Inc., pp 122–132. https://doi.org/10.1109/SMDS49396.2020.00022
Suleykin A, Panfilov P (2020) Metadata-driven industrial-grade ETL system. In: Proceedings - 2020 IEEE international conference on big data, Big Data, Institute of Electrical and Electronics Engineers Inc., pp 2433–2442. https://doi.org/10.1109/BigData50022.2020.9378367
Tesfagiorgish DG, JunYi L (2015) Big data transformation testing based on data reverse engineering. In: 2015 IEEE 12th international conference on ubiquitous intelligence and computing and 2015 IEEE 12th international conference on autonomic and trusted computing and 2015 IEEE 15th international conference on scalable computing and communications and its associated workshops (UIC-ATC-ScalCom), IEEE, pp 649–652. https://doi.org/10.1109/UIC-ATC-ScalCom-CBDCom-IoP.2015.129
Samarasinghe R, Perera G, Perera N, Senaratna P, Samarasingha L (2017) People clues: business intelligence tool for team dynamics. In: 2017 seventeenth international conference on advances in ICT for emerging regions (ICTer), pp 1–6
Seay C, Agrawal R, Kadadi A, Barel Y (2015) Using Hadoop on the mainframe: a big solution for the challenges of big data. In: Proceedings-12th international conference on information technology: new generations, ITNG, Institute of Electrical and Electronics Engineers Inc., pp 765–769. https://doi.org/10.1109/ITNG.2015.135
Muthyala R et al. (2017) Data-driven job search engine using skills and company attribute filters. In: IEEE International Conference on Data Mining Workshops, ICDMW, vol. 2017, IEEE Computer Society, pp 199–206
Kim S-S, Yu S-H (2015) Architecture of geospatial big-data batch processing model based on Hadoop. In: 2015 international conference on information and communication technology convergence (ICTC), pp 964–966
Adilah S et al. (2017) The challenges of extract, transform and loading (ETL) system implementation for near real-time environment. In: 2017 international conference on research and innovation in information systems (ICRIIS) pp 1–5
Ma S et al. (2019) Bank big data architecture based on massive parallel processing database. In: Proceedings - 2018 15th international symposium on pervasive systems, algorithms and networks, I-SPAN, Institute of Electrical and Electronics Engineers Inc., pp 93–99. https://doi.org/10.1109/I-SPAN.2018.00024
Moatti Y et al. (2017) Too big to eat: boosting analytics data ingestion from object stores with scoop. In: Proceedings - international conference on data engineering, IEEE Computer Society, pp 309–320. https://doi.org/10.1109/ICDE.2017.243
Kholod II, Efimova MS (2017) Smart collection of data for financial instruments. In: 2017 XX IEEE international conference on soft computing and measurements (SCM), pp 705–708
Houari ME, Rhanoui M, Asri BE (2017) Hybrid big data warehouse for On-demand decision needs. In: 2017 international conference on electrical and information technologies (ICEIT), pp 1–6
Diouf PS, Boly A, Ndiaye S (2018) Variety of data in the ETL processes in the cloud: state of the art. In: International conference on innovative research and development (ICIRD), pp 1–5
Diouf PS, Boly A, Ndiaye S (2017) Performance of the ETL processes in terms of volume and velocity in the cloud: state of the art. In: 2017 4th IEEE international conference on engineering technologies and applied sciences (ICETAS), pp 1–5
Chou SC, Yang CT, Jiang FC, Chang CH (2018) The implementation of a data-accessing platform built from big data warehouse of electric loads. In: Proceedings - international computer software and applications conference, vol. 2, IEEE Computer Society, pp 87–92
Figueiras P et al. (2017) User interface support for a big ETL data processing pipeline an application scenario on highway toll charging models. In: 2017 International conference on engineering, technology and innovation (ICE/ITMC), pp 1437–1444
Xu B, Zhu S, Yu J, Li C, Sun Q (2017) Designing ETL processes to integrate multi-field digital information resources. In: 2017 2nd international conference on image, vision and computing (ICIVC), pp 1053–1057
Deshpande PM, Margoor A, Venkatesh R (2018) Automatic tuning of SQL-on-Hadoop engines on cloud platforms. In: IEEE International Conference on Cloud Computing, CLOUD, vol. 2018, IEEE Computer Society, pp 508–515
Bala M, Boussaid O, Alimazighi Z (2014) P-ETL: parallel-ETL based on the mapreduce paradigm. In: 2014 IEEE/ACS 11th international conference on computer systems and applications (AICCSA), pp 42–49
Aluvalu R, Jabbar MA (2018) Handling data analytics on unstructured data using MongoDB. Smart Cities Symp 2018:1–5
Zeng YR, Chang YS, Fang YH (2019) Data visualization for air quality analysis on bigdata platform. In: 2019 international conference on system science and engineering (ICSSE), pp 313–317
Azqueta-Alzuaz A, Patino-Martinez M, Brondino I, Jimenez-Peris R (2017) Massive data load on distributed database systems over HBase. In: Proceedings - 2017 17th IEEE/ACM international symposium on cluster, cloud and grid computing, CCGRID, Institute of Electrical and Electronics Engineers Inc., pp 776–779. https://doi.org/10.1109/CCGRID.2017.124
Mehmood E, Anees T (2020) Challenges and solutions for processing real-time big data stream: a systematic literature review. IEEE Access 8:119123–119143. https://doi.org/10.1109/ACCESS.2020.3005268
Plazas JE et al (2022) Sense, transform & send for the internet of things (STS4IoT): UML profile for data-centric IoT applications. Data Knowl Eng 139:101971
Sanprasit N, Jampachaisri K, Titijaroonroj T, Kesorn K (2021) Intelligent approach to automated star-schema construction using a knowledge base. Expert Syst Appl 182:115226
Antunes AL, Cardoso E, Barateiro J (2022) Incorporation of ontologies in data warehouse/business intelligence systems - a systematic literature review. Int J Inf Manag Data Insights. https://doi.org/10.1016/j.jjimei.2022.100131
Deb Nath RP, Hose K, Pedersen TB, Romero O (2017) SETL: a programmable semantic extract-transform-load framework for semantic data warehouses. Inf Syst 68:17–43
Simitsis A, Skoutas D, Castellanos M (2010) Representation of conceptual ETL designs in natural language using Semantic Web technology. Data Knowl Eng 69:96–115
Teixeira MAC, Belloze KT, Cavalcanti MC, Silva-Junior FP (2018) Data mart construction based on semantic annotation of scientific articles: a case study for the prioritization of drug targets. Comput Methods Programs Biomed 157:225–235
Ta’a A, Abdullah MS (2011) Goal-ontology approach for modeling and designing ETL processes. Proc Comput Sci 3:942–948
Khouri S, Berkani N, Bellatreche L (2017) Tracing data warehouse design lifecycle semantically. Comput Stand Interf 51:132–151
Kang TW, Hong CH (2015) A study on software architecture for effective BIM/GIS-based facility management data integration. Autom Constr 54:25–38
Kilias T, Löser A, Andritsos P (2015) INDREX: in-database relation extraction. Inf Syst 53:124–144
Marco-Ruiz L, Moner D, Maldonado JA, Kolstrup N, Bellika JG (2015) Archetype-based data warehouse environment to enable the reuse of electronic health record data. Int J Med Inform 84:702–714
Mendoza M, Alegría E, Maca M, Cobos C, León E (2015) Multidimensional analysis model for a document warehouse that includes textual measures. Decis Support Syst 72:44–59
Selma K et al (2012) Ontology-based structured web data warehouses for sustainable interoperability: requirement modeling, design methodology and tool. Comput Ind 63:799–812
Nebot V, Berlanga R (2012) Building data warehouses with semantic web data. Decis Support Syst 52:853–868
Kraiem MB, Feki J, Khrouf K, Ravat F, Teste O (2015) Modeling and OLAPing social media: the case of Twitter. Soc Netw Anal Min 5:1–15
Salem R, Boussaïd O, Darmont J (2013) Active XML-based Web data integration. Inf Syst Front 15:371–398
Khouri S, Bellatreche L (2017) Design life-cycle-driven approach for data warehouse systems configurability. J Data Semant 6:83–111
Villarroya S, Viqueira JRR, Regueiro MA, Taboada JA, Cotos JM (2016) SODA: a framework for spatial observation data analysis. Distrib Parallel Databases 34:65–99
Araibi N, Ben Ahmed E, Karaa Ben Abdessalem W (2016) \(\mathcal {IRORS}\): intelligent recommendation of RSS feeds. Vietnam J Comput Sci 3:47–56
Boukhari I, Jean S, Ait-Sadoune I, Bellatreche L (2018) The role of user requirements in data repository design. Int J Softw Tools Technol Transf 20:19–34
Miyoshi NSB, Pinheiro DG, Silva WA, Felipe JC (2013) Computational framework to support integration of biomolecular and clinical data within a translational approach. BMC Bioinf 14:1–12
Moalla I, Nabli A, Bouzguenda L, Hammami M (2017) Data warehouse design approaches from social media: review and comparison. Social Netw Anal Min. https://doi.org/10.1007/s13278-017-0423-8
Xu Y et al (2019) An information integration and transmission model of multi-source data for product quality and safety. Inf Syst Front 21:191–212
Sideridis S, Pelekis N, Theodoridis Y (2016) On querying and mining semantic-aware mobility timelines. Int J Data Sci Anal 2:29–44
Priyatna F, Alonso-Calvo R, Paraiso-Medina S, Corcho O (2017) Querying clinical data in HL7 RIM based relational model with morph-RDB. J Biomed Semant 8:1–12
Pressat-Laffouilhère T et al (2022) Evaluation of Doc’EDS: a French semantic search tool to query health documents from a clinical data warehouse. BMC Med Inform Decis Mak 22:34
Haberson A, Rinner C, Schöberl A, Gall W (2019) Feasibility of mapping Austrian health claims data to the OMOP common data model. J Med Syst 43:1–5
Omidvar A, Garakani M, Safarpour HR (2014) Context based user ranking in forums for expert finding using WordNet dictionary and social network analysis. Inf Technol Manag 15:51–63
Geibel P et al (2015) Ontology-based information extraction: identifying eligible patients for clinical trials in neurology. J Data Semant 4:133–147
Carrasco RA, Muñoz-Leiva F, Hornos MJ (2013) A multidimensional data model using the fuzzy model based on the semantic translation. Inf Syst Front 15:351–370
Girardi D, Dirnberger J, Giretzlehner M (2015) An ontology-based clinical data warehouse for scientific research. Safety in Health, vol. 1. http://www.safetyinhealth.com/content/1/1/6
Berkani N, Bellatreche L, Khouri S (2013) Towards a conceptualization of ETL and physical storage of semantic data warehouses as a service. Cluster Comput 16:915–931
Berkani N, Bellatreche L, Khouri S, Ordonez C (2020) The contribution of linked open data to augment a traditional data warehouse. J Intell Inf Syst 55:397–421
Lopes P, Luís Oliveira J (2012) COEUS: ‘semantic web in a box’ for biomedical applications. J Biomed Semant 3:1–19
Hanna J, Joseph E, Brochhausen M, Hogan WR (2013) Building a drug ontology based on RxNorm and other sources. J Biomed Semant 4:1–9
del Carmen Legaz-García M, Miñarro-Giménez JA, Menárguez-Tortosa M, Fernández-Breis JT (2016) Generation of open biomedical datasets through ontology-driven transformation and integration processes. J Biomed Semant 7:1–17
Essa YM, Attiya G, El-Sayed A, ElMahalawy A (2018) Data processing platforms for electronic health records. Health Technol 8:271–280
Pannarale P et al (2012) GIDL: a rule based expert system for GenBank intelligent data loading into the molecular biodiversity database. BMC Bioinf 13:1–14
Moalla I, Nabli A, Hammami M (2022) Data warehouse building to support opinion analysis in social media. Soc Netw Anal Min 12:123
Iksan LH et al. (2021) Implementation of cloud based action recognition backend platform. In: 2021 international conference on artificial intelligence and mechatronics systems (AIMS), pp 1–6. https://doi.org/10.1109/AIMS52415.2021.9466068
El Hafyani H, Abboud M, Taher Y (2021) A microservices based architecture for implementing and automating ETL data pipelines for mobile crowdsensing applications. In: 2021 IEEE international conference on big data (Big Data), pp 5909–5911. https://doi.org/10.1109/BigData52589.2021.9671382
Milev I, Zajc M (2022) Tangible information for active consumers: data from smart home device and smart meter become customer newsletters. In: 2022 30th telecommunications forum (TELFOR), pp 1–4. https://doi.org/10.1109/TELFOR56187.2022.9983708
Catovic A, Kadusic E, Ruland C, Zivic N, Hadzajlic N (2022) Air pollution prediction and warning system using IoT and machine learning. In: 2022 international conference on electrical, computer, communications and mechatronics engineering (ICECCME), pp 1–4. https://doi.org/10.1109/ICECCME55909.2022.9987957
Younes AB, Ayed LB, Najjar M (2022) Intelligent assistance with ML in data mapping ETL processing. In: 2022 IEEE Information Technologies & Smart Industrial Systems (ITSIS), pp 1–4. https://doi.org/10.1109/ITSIS56166.2022.10118369
Valtolina S, Ferrari L, Mesiti M (2019) Ontology-based consistent specification of sensor data acquisition plans in cross-domain iot platforms. IEEE Access 7:176141–176169
Onal AC, Berat Sezer O, Ozbayoglu M, Dogdu E (2017) Weather data analysis and sensor fault detection using an extended IoT framework with semantics, big data, and machine learning. In: 2017 IEEE international conference on big data (Big Data), pp 2037–2046
Sutheparaks U, Vatanawood W, Patanothai C (2011) Defining global schema for ETL of human resource performance appraisal system using REA ontology. In: 011 eighth international joint conference on computer science and software engineering (JCSSE), IEEE, pp 275–280
Lee S, Park BH, Lim SH, Shankar M (2015) Table2Graph: a scalable graph construction from relational tables using map-reduce. In: Proceedings - 2015 IEEE 1st international conference on big data computing service and applications, BigDataService, Institute of Electrical and Electronics Engineers Inc., pp 294–301. https://doi.org/10.1109/BigDataService.2015.52
Nebot V, Berlanga R (2010) Populating data warehouses with semantic data. IEEE Lat Am Trans 8:150–157
Marx E, Shekarpour S, Auer S, Ngomo ACN (2013) Large-scale RDF dataset slicing. In: Proceedings - 2013 IEEE 7th international conference on semantic computing, ICSC, pp 228–235. https://doi.org/10.1109/ICSC.2013.47
McCarthy S, McCarren A, Roantree M (2019) A method for automated transformation and validation of online datasets. In: Proceedings - 2019 IEEE 23rd international enterprise distributed object computing conference, EDOC, Institute of Electrical and Electronics Engineers Inc., pp 183–189. https://doi.org/10.1109/EDOC.2019.00030
Jiang L, Cai H, Xu B (2010) A domain ontology approach in the ETL process of data warehousing. Proc- IEEE Int Conf E-Business Eng, ICEBE 2010:30–35. https://doi.org/10.1109/ICEBE.2010.36
Huang OR, Ou YL, Zhang MH, Zhang C (2012) Application of ontology-based automatic ETL in marine data integration. IEEE symposium on electrical & electronics engineering
Chang YS, Lin KM, Tsai YT, Zeng YR, Hun CX (2018) Big data platform for air quality analysis and prediction. In: 2018 27th wireless and optical communication conference (WOCC), pp 1–3
Berkani N, Bellatreche L, Ordonez C (2018) ETL-aware materialized view selection in semantic data stream warehouses. In: 2018 12th international conference on research challenges in information science (RCIS), pp 1–11
Abelló A et al (2015) Using semantic web technologies for exploratory OLAP: a survey. IEEE Trans Knowl Data Eng 27:571–588
Gollapudi S (2015) Aggregating financial services data without assumptions: a semantic data reference architecture. In: Proceedings of the 2015 IEEE 9th international conference on semantic computing (IEEE ICSC 2015), pp 312–315
Berkani N, Khouri S, Bellatreche L (2012) Generic methodology for semantic data warehouse design: From schema definition to ETL. In: Proceedings of the 2012 4th international conference on intelligent networking and collaborative systems, INCoS, pp 404–411. https://doi.org/10.1109/iNCoS.2012.108
Bansal SK (2014) Towards a semantic extract-transform-load (ETL) framework for big data integration. In: Proceedings - 2014 IEEE international congress on big data, BigData Congress 2014, Institute of Electrical and Electronics Engineers Inc., pp 522–529. https://doi.org/10.1109/BigData.Congress.2014.82
Abdellaoui S, Nader F (2015) Semantic data warehouse at the heart of competitive intelligence systems: design approach. In: 2015 6th international conference on information systems and economic intelligence (SIIE), IEEE
Hoppe T, Humm B, Reibold A (2018) Semantic applications: methodology, technology, corporate use. Semantic applications: methodology, technology, corporate use. https://doi.org/10.1007/978-3-662-55433-3
Madsen MR (2009) The role of open source in data integration. Third nature Technology Report
Author information
Authors and Affiliations
Contributions
The study's inception and design involved input from all authors. CB and MRCL prepared the method, collected the data, and carried out the analysis. CB and HB wrote the manuscript's initial draft, and MRCL offered feedback on earlier drafts. ZB corrected, reviewed, and edited the manuscript. The final manuscript was reviewed and approved by all authors.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no competing interests.
Human and animal rights
This article does not contain any studies with human or animal subjects performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Boulahia, C., Behja, H., Chbihi Louhdi, M.R. et al. The multi-criteria evaluation of research efforts based on ETL software: from business intelligence approach to big data and semantic approaches. Evol. Intel. (2024). https://doi.org/10.1007/s12065-023-00899-z
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12065-023-00899-z