Performance Improvement of Open Source Based Business Intelligence System Using Database Modeling and Outlier Detection
- 305 Downloads
With all the advanced technology nowadays, new data is being generated every minute. For example, the average size of the computer’s hard disk is 10 gigabytes in 2000, today on the Facebook website has increased 500 terabytes of new data per day . Data is growing rapidly, but it is not enough valuable. Thus, it is important to extract information that is useful in the future from a large amount of data. Business intelligence (BI) systems make a prediction that supports a business decision by analyzing collected data . However, the accuracy of prediction depends on a data quality. In practice, data is usually a very low quality that includes many incomplete and anomaly data. Moreover, another problem is if data size increases, query response will be slow. Previous research work, we proposed a framework based on open-source technologies for the BI systems that possibility to analyze big data efficiently and apply it to the supermarket’s BI system. Under this solution, we have studied Hadoop data storage system, Hive data warehouse software, Sqoop data transmission tool and etc., successfully implemented them. In this paper, we have added anomaly detection stage on the proposed framework to improve information about related products that are purchased together by eliminating anomaly. Also, we have made an experimental study to improve the speed of time-dependent reports by applying the dimensional model to Hive data warehouse. In dimensional model data is stored in context of the single table (centralized context), and in relational model the context is distributed over many tables. As a result of the experimental study, the dimensional model is more efficient; its query response time is shown to be at least two times faster than the relational model based data warehouse.
KeywordsBI system Big data Data warehouse Data mining Anomaly detection
This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning (No. 2017R1A2B4010826), by the Private Intelligence Information Service Expansion (No. C0511-18-1001) funded by the NIPA (National IT Industry Promotion Agency), by the Business for Cooperative R&D between Industry, Academy, and Research Institute funded Korea Small and Medium Business Administration in 2017 (Grants No. C0541451) and also by the National Natural Science Foundation of China (61702324) in People’s Republic of China.
- 1.Big Data Explained: [Online]. Available: https://www.mongodb.com/BIG-DATA-EXPLAINED
- 2.Chee, T., Chan, L.K., Chuah, M.H., Tan, C.S., Wong, S.F., Yeoh, W.: Business intelligence systems: state-of-the-art review and contemporary applications: In: Symposium on Progress in Information & Communication Technology, pp. 16–30, Kuala Lumpur, Malaysia (2009)Google Scholar
- 3.Tan, P.N., Steinbach, M.: Introduction to Data Mining, 1st edn. Pearson Education Inc., Boston (2006)Google Scholar
- 11.Liu, K., Zhou, X., Feng, Y., Liu, J.: Clinical data preprocessing and case studies of POMDP for TCM treatment knowledge discovery. In: 14th International Conference on e-Health Networking, Applications and Services (Healthcom), pp. 10–14, IEEE (2012)Google Scholar
- 12.Amarbayasgalan, T., Bukhsuren, E., Namsrai, O., Ryu, K.H.: the approach of implementing business intelligence system: possibility to analyze big data. JARDCS (2), 775–779 (2018)Google Scholar
- 13.Alfredo, C.: Analytics over big data: exploring the convergence of data warehousing, OLAP and data-intensive cloud infrastructures. In: IEEE 37th Annual Computer Software and Applications Conference (2013)Google Scholar
- 14.Rogers, S.: Big data is scaling BI and analytics. Inf. Manage. 21(5), 14–20 (2011)Google Scholar
- 15.Li, D., Park, H.W., Batbaatar, E., Munkhdalai, L., Musa, I., Li, M., Ryu, K.H.: Application of a mobile chronic disease health-care system for hypertension based on big data platforms. J. Sens. (2018)Google Scholar
- 16.Apache Hadoop https://hadoop.apache.org
- 17.Apache Hive https://hive.apache.org
- 18.Apache Sqoop http://sqoop.apache.org/