Skip to main content
Log in

Empirical analysis of metrics for object oriented multidimensional model of data warehouse using unsupervised machine learning techniques

  • Original Article
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

Data Warehouse provides the foundation for businesses to take informed decisions for day to day operations and making future strategy. Since the role is so pivotal to the growth and success of the business, its quality is very critical. Conceptual models of data warehouses give us a great insight into the quality of the developed system during the early stages of the design process. Researchers have proposed a number of metrics to evaluate the quality of these object oriented multidimensional models. Further, for these metrics to be used in practice, empirical evaluation is crucial. There are a number of propositions in literature that work towards empirical validation of metrics. But most of them are either restricted to statistical techniques or supervised machine learning techniques. In order to empirically validate the metrics, we need to get user responses for a number of schemas and take down observations to quantify model quality aspects like understandability, efficiency etc. This can result in personal biases, errors and random outliers which impacts the evaluation model. In this paper, we have made a first attempt to assess the relationship between the object oriented multidimensional data warehouse structural metrics and understandability of its models by using unsupervised machine learning techniques with the aid of a data warehouse quality expert. The results indicate that the proposed metrics have a strong relationship with understandability and inturn quality of the data warehouse conceptual models and the unsupervised techniques are able to identify this relationship with high degree of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Anahory S, Murray D (1997) Data warehousing in the real world: a practical guide for building decision support systems. Addison-Wesley Longman Publishing Co., New York

    Google Scholar 

  • Basili VR, Shull F, Lanubile F (1999) Building knowledge through families of experiments. IEEE Trans Softw Eng 25:456–473

    Article  Google Scholar 

  • Bishnu PS, Bhattacherjee V (2012) Software fault prediction using quad tree-based K-means clustering algorithm. IEEE Trans Knowl Data Eng 24:1146–1150. doi:10.1109/TKDE.2011.163

    Article  Google Scholar 

  • Bouguettaya A, Yu Q, Liu X, Zhou X, Song A (2015) Efficient agglomerative hierarchical clustering. Expert Syst Appl 42:2785–2797

    Article  Google Scholar 

  • Calero C, Piattini M, Pascual C, Serrano MA (2001) Towards data warehouse quality metrics. In: Proceedings of 3rd International workshop on design and management of data warehouses, Interlaken, Switzerland, pp 1–10

  • Catal C, Sevim U, Diri B (2009) Clustering and metrics thresholds based software fault prediction of unlabeled program modules. In: Proceedings of sixth international conference on information technology: new generations. IEEE, Las Vegas, pp 199–204

  • Chug A, Dhall S (2013) Software defect prediction using supervised learning algorithm and unsupervised learning algorithm. In: Proceedings of confluence 2013: the next generation information technology summit (4th international conference). Institution of Engineering and Technology, Noida, pp 173–179

  • Dahiya N, Sangwan N, Bhatnagar V, Singh M (2014) An experiment towards metrics validation for data warehouse conceptual models. In: Proceedings of 5th international conference on confluence the next generation information technology summit. IEEE, Noida, India, pp 116–123

  • Fenton N, Pfleeger S (1997) Software metrics: a rigorous approach. Chapman & Hall, London

    Google Scholar 

  • Gaur H, Kumar M (2014) Assessing the understandability of a data warehouse logical model using a decision-tree approach. ACM SIGSOFT Softw Eng Notes 39:1–6

    Article  Google Scholar 

  • Gosain A, Mann S (2014) Empirical validation of metrics for object oriented multidimensional model for data warehouse. Int J Syst Assur Eng Manag 5:262–275

    Article  Google Scholar 

  • Gosain A, Sabharwal S, Nagpal S (2010) Neural network approach to predict quality of data warehouse multidimensional model. In: Proceedings of International conference on advances in computer science, Kerala, pp 241–244

  • Gosain A, Sabharwal S, Nagpal S (2011) Assessment of quality of data warehouse multidimensional model. Int J Inf Qual 2:344–358

    Article  Google Scholar 

  • Gosain A, Sabharwal S, Nagpal S (2012) Predicting quality of data warehouse using fuzzy logic. Int J Bus Syst Res 6:255–268

    Article  Google Scholar 

  • Gosain A, Nagpal S, Sabharwal S (2013) Validating dimension hierarchy metrics for the understandability of multidimensional models for data warehouse. IET Softw 7:93–103

    Article  Google Scholar 

  • Gupta D, Goyal VK, Mittal H (2012) Analysis of clustering techniques for software quality prediction. In: Proceedings of second international conference on advanced computing & communication technologies. IEEE, Haryana, pp 6–9

  • Gupta D, Goyal VK, Mittal H (2013) Estimating of software quality with clustering techniques. In: Proceedings of third international conference on advanced computing and communication technologies (ACCT). IEEE, Haryana, pp 20–27

  • Inmon WH (1997) Building data warehouse. Wiley, New York

    Google Scholar 

  • Kant S, Ansari IA (2015) An improved K means clustering with Atkinson index to classify liver patient dataset. Int J Syst Assur Eng Manag 1–7. doi:10.1007/s13198-015-0365-3

  • Kaur A, Gulati S (2011) A framework for analyzing software quality using hierarchical clustering. Int J Comput Sci Eng 3:854–861

    Google Scholar 

  • Konovalov A (2002) Object-oriented data model for data warehouse. In: Proceedings of 6th East European conference on advances in databases and information systems (lecture notes in computer science). Springer Berlin Heidelberg Slovakia, pp 319–325

  • Kumar R, Rai S, Trahan JL (1998) Neural network techniques for software quality evaluation. In: Proceedings of the annual reliability and maintainability symposium. IEEE, California, pp 155–161

  • Kumar M, Gosain A, Singh Y (2014) Empirical validation of structural metrics for predicting understandability of conceptual schemasfor data warehouse. Int J Syst Assur Eng Manag 5:291–306

    Article  Google Scholar 

  • Nagpal S, Gosain A, Sabharwal S (2013) Theoretical and empirical validation of comprehensive complexity metric for multidimensional models for data warehouse. Int J Syst Assur Eng Manag 4:193–204

    Article  Google Scholar 

  • Pal J, Bhattacherjee V (2014) Hierarchical cluster generation for software quality: a comparative approach. Int J Eng Technol 6:1827–1839

    Google Scholar 

  • Pedrycz W, Succi G, Reformat M, Musilek P, Bai X (2001) Self organizing maps as a tool for software analysis. In: Proceedings of the Canadian conference on electrical and computer engineering. IEEE, Toronto, pp 93–97

  • Popat SK, Emmanuel M (2014) Review and comparative study of clustering techniques. Int J Comput Sci Inf Technol 5:805–812

    Google Scholar 

  • Seliya N, Khoshgoftaar TM (2007) Software quality analysis of unlabeled program modules with semisupervised clustering. IEEE Trans Syst Man, Cybern Part A Syst Humans 37:201–211. doi:10.1109/TSMCA.2006.889473

    Article  Google Scholar 

  • Serrano M, Calero C, Piattini M (2003) Experimental validation of multidimensional data models metrics. In: Proceedings of the 36th annual Hawaii international conference on system sciences. IEEE, Hawaii, pp 1–7

  • Serrano M, Calero C, Trujillo J, Lujan S, Piattini M (2004) Empirical validation of metrics for conceptual models of data warehouse. In: Proceedings of the 16th international conference on advanced information systems engineering (lecture notes in computer science). Springer, Berlin Heidelberg, Latvia, pp 506–520

  • Serrano M, Calero C, Piattini M (2005) An experimental replication with data warehouse metrics. Int J Data Warehous Min 1:1–21

    Article  Google Scholar 

  • Serrano M, Trujillo J, Calero C, Piattini M (2007) Metrics for data warehouse conceptual models understandability. Inf Softw Technol 49:851–870

    Article  Google Scholar 

  • Serrano MA, Calero C, Sahraoui HA, Piattini M (2008) Empirical studies to assess the understandability of data warehouse schemas using structural metrics. Softw Qual J 16:79–106

    Article  Google Scholar 

  • Singh J, Vashishtha S (2015) Validation of object oriented metrics for evaluating understandability of data warehouse models. Int J Comput Appl 118:26–33

    Google Scholar 

  • Thakare YS, Bagal SB (2015) Performance evaluation of K-means clustering algorithm with various distance metrics. Int J Comput Appl 110:12–16

    Google Scholar 

  • Trujillo J, Palomar M, Gomez J, Song IY (2001) Designing data warehouses with OO conceptual models. IEEE Comput Spec Issue Data Wareh 34:66–75

    Google Scholar 

  • Yuan X, Khoshgoftaar TM, Allen E, Ganesan K (2000) An application of fuzzy clustering to software quality prediction. In: Proceedings of the third IEEE symposium on application-specific systems and software engineering technology (ASSET’00). IEEE, Texas, pp 85–90

  • Zhong S, Khoshgoftaar TM, Seliya N (2004) Unsupervised learning for expert-based software quality estimation. In: Proceedings of the eighth IEEE international symposium on high assurance systems engineering (HASE’04). IEEE, Florida, pp 149–155

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gargi Aggarwal.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sabharwal, S., Nagpal, S. & Aggarwal, G. Empirical analysis of metrics for object oriented multidimensional model of data warehouse using unsupervised machine learning techniques. Int J Syst Assur Eng Manag 8 (Suppl 2), 703–715 (2017). https://doi.org/10.1007/s13198-016-0508-1

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-016-0508-1

Keywords

Navigation