Abstract
Data Warehouse provides the foundation for businesses to take informed decisions for day to day operations and making future strategy. Since the role is so pivotal to the growth and success of the business, its quality is very critical. Conceptual models of data warehouses give us a great insight into the quality of the developed system during the early stages of the design process. Researchers have proposed a number of metrics to evaluate the quality of these object oriented multidimensional models. Further, for these metrics to be used in practice, empirical evaluation is crucial. There are a number of propositions in literature that work towards empirical validation of metrics. But most of them are either restricted to statistical techniques or supervised machine learning techniques. In order to empirically validate the metrics, we need to get user responses for a number of schemas and take down observations to quantify model quality aspects like understandability, efficiency etc. This can result in personal biases, errors and random outliers which impacts the evaluation model. In this paper, we have made a first attempt to assess the relationship between the object oriented multidimensional data warehouse structural metrics and understandability of its models by using unsupervised machine learning techniques with the aid of a data warehouse quality expert. The results indicate that the proposed metrics have a strong relationship with understandability and inturn quality of the data warehouse conceptual models and the unsupervised techniques are able to identify this relationship with high degree of accuracy.
Similar content being viewed by others
References
Anahory S, Murray D (1997) Data warehousing in the real world: a practical guide for building decision support systems. Addison-Wesley Longman Publishing Co., New York
Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25:456–473
Bishnu PS, Bhattacherjee V (2012) Software fault prediction using quad tree-based K-means clustering algorithm. IEEE Trans Knowl Data Eng 24:1146–1150. doi:10.1109/TKDE.2011.163
Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42:2785–2797
Calero C, Piattini M, Pascual C, Serrano MA (2001) Towards data warehouse quality metrics. In: Proceedings of 3rd International workshop on design and management of data warehouses, Interlaken, Switzerland, pp 1–10
Catal C, Sevim U, Diri B (2009) Clustering and metrics thresholds based software fault prediction of unlabeled program modules. In: Proceedings of sixth international conference on information technology: new generations. IEEE, Las Vegas, pp 199–204
Chug A, Dhall S (2013) Software defect prediction using supervised learning algorithm and unsupervised learning algorithm. In: Proceedings of confluence 2013: the next generation information technology summit (4th international conference). Institution of Engineering and Technology, Noida, pp 173–179
Dahiya N, Sangwan N, Bhatnagar V, Singh M (2014) An experiment towards metrics validation for data warehouse conceptual models. In: Proceedings of 5th international conference on confluence the next generation information technology summit. IEEE, Noida, India, pp 116–123
Fenton N, Pfleeger S (1997) Software metrics: a rigorous approach. Chapman & Hall, London
Gaur H, Kumar M (2014) Assessing the understandability of a data warehouse logical model using a decision-tree approach. ACM SIGSOFT Softw Eng Notes 39:1–6
Gosain A, Mann S (2014) Empirical validation of metrics for object oriented multidimensional model for data warehouse. Int J Syst Assur Eng Manag 5:262–275
Gosain A, Sabharwal S, Nagpal S (2010) Neural network approach to predict quality of data warehouse multidimensional model. In: Proceedings of International conference on advances in computer science, Kerala, pp 241–244
Gosain A, Sabharwal S, Nagpal S (2011) Assessment of quality of data warehouse multidimensional model. Int J Inf Qual 2:344–358
Gosain A, Sabharwal S, Nagpal S (2012) Predicting quality of data warehouse using fuzzy logic. Int J Bus Syst Res 6:255–268
Gosain A, Nagpal S, Sabharwal S (2013) Validating dimension hierarchy metrics for the understandability of multidimensional models for data warehouse. IET Softw 7:93–103
Gupta D, Goyal VK, Mittal H (2012) Analysis of clustering techniques for software quality prediction. In: Proceedings of second international conference on advanced computing & communication technologies. IEEE, Haryana, pp 6–9
Gupta D, Goyal VK, Mittal H (2013) Estimating of software quality with clustering techniques. In: Proceedings of third international conference on advanced computing and communication technologies (ACCT). IEEE, Haryana, pp 20–27
Inmon WH (1997) Building data warehouse. Wiley, New York
Kant S, Ansari IA (2015) An improved K means clustering with Atkinson index to classify liver patient dataset. Int J Syst Assur Eng Manag 1–7. doi:10.1007/s13198-015-0365-3
Kaur A, Gulati S (2011) A framework for analyzing software quality using hierarchical clustering. Int J Comput Sci Eng 3:854–861
Konovalov A (2002) Object-oriented data model for data warehouse. In: Proceedings of 6th East European conference on advances in databases and information systems (lecture notes in computer science). Springer Berlin Heidelberg Slovakia, pp 319–325
Kumar R, Rai S, Trahan JL (1998) Neural network techniques for software quality evaluation. In: Proceedings of the annual reliability and maintainability symposium. IEEE, California, pp 155–161
Kumar M, Gosain A, Singh Y (2014) Empirical validation of structural metrics for predicting understandability of conceptual schemasfor data warehouse. Int J Syst Assur Eng Manag 5:291–306
Nagpal S, Gosain A, Sabharwal S (2013) Theoretical and empirical validation of comprehensive complexity metric for multidimensional models for data warehouse. Int J Syst Assur Eng Manag 4:193–204
Pal J, Bhattacherjee V (2014) Hierarchical cluster generation for software quality: a comparative approach. Int J Eng Technol 6:1827–1839
Pedrycz W, Succi G, Reformat M, Musilek P, Bai X (2001) Self organizing maps as a tool for software analysis. In: Proceedings of the Canadian conference on electrical and computer engineering. IEEE, Toronto, pp 93–97
Popat SK, Emmanuel M (2014) Review and comparative study of clustering techniques. Int J Comput Sci Inf Technol 5:805–812
Seliya N, Khoshgoftaar TM (2007) Software quality analysis of unlabeled program modules with semisupervised clustering. IEEE Trans Syst Man, Cybern Part A Syst Humans 37:201–211. doi:10.1109/TSMCA.2006.889473
Serrano M, Calero C, Piattini M (2003) Experimental validation of multidimensional data models metrics. In: Proceedings of the 36th annual Hawaii international conference on system sciences. IEEE, Hawaii, pp 1–7
Serrano M, Calero C, Trujillo J, Lujan S, Piattini M (2004) Empirical validation of metrics for conceptual models of data warehouse. In: Proceedings of the 16th international conference on advanced information systems engineering (lecture notes in computer science). Springer, Berlin Heidelberg, Latvia, pp 506–520
Serrano M, Calero C, Piattini M (2005) An experimental replication with data warehouse metrics. Int J Data Warehous Min 1:1–21
Serrano M, Trujillo J, Calero C, Piattini M (2007) Metrics for data warehouse conceptual models understandability. Inf Softw Technol 49:851–870
Serrano MA, Calero C, Sahraoui HA, Piattini M (2008) Empirical studies to assess the understandability of data warehouse schemas using structural metrics. Softw Qual J 16:79–106
Singh J, Vashishtha S (2015) Validation of object oriented metrics for evaluating understandability of data warehouse models. Int J Comput Appl 118:26–33
Thakare YS, Bagal SB (2015) Performance evaluation of K-means clustering algorithm with various distance metrics. Int J Comput Appl 110:12–16
Trujillo J, Palomar M, Gomez J, Song IY (2001) Designing data warehouses with OO conceptual models. IEEE Comput Spec Issue Data Wareh 34:66–75
Yuan X, Khoshgoftaar TM, Allen E, Ganesan K (2000) An application of fuzzy clustering to software quality prediction. In: Proceedings of the third IEEE symposium on application-specific systems and software engineering technology (ASSET’00). IEEE, Texas, pp 85–90
Zhong S, Khoshgoftaar TM, Seliya N (2004) Unsupervised learning for expert-based software quality estimation. In: Proceedings of the eighth IEEE international symposium on high assurance systems engineering (HASE’04). IEEE, Florida, pp 149–155
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Sabharwal, S., Nagpal, S. & Aggarwal, G. Empirical analysis of metrics for object oriented multidimensional model of data warehouse using unsupervised machine learning techniques. Int J Syst Assur Eng Manag 8 (Suppl 2), 703–715 (2017). https://doi.org/10.1007/s13198-016-0508-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-016-0508-1