SmallClient for big data: an indexing framework towards fast data retrieval

Siddiqa, Aisha; Karim, Ahmad; Chang, Victor

doi:10.1007/s10586-016-0712-4

SmallClient for big data: an indexing framework towards fast data retrieval

Published: 20 December 2016

Volume 20, pages 1193–1208, (2017)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Aisha Siddiqa¹,
Ahmad Karim² &
Victor Chang³

483 Accesses
14 Citations
Explore all metrics

Abstract

Numerous applications are continuously generating massive amount of data and it has become critical to extract useful information while maintaining acceptable computing performance. The objective of this work is to design an indexing framework which minimizes indexing overhead and improves query execution and data search performance with optimum aggregation of computing performance. We propose SmallClient, an indexing framework to speed up query execution. SmallClient has three modules: block creation, index creation and query execution. Block creation module supports improving data retrieval performance with minimum data uploading overhead. Index creation module allows maximum indexes on a dataset to increase index hit ratio with minimized indexing overhead. Finally, query execution module offers incoming queries to utilize these indexes. The evaluation shows that SmallClient outperforms Hadoop full scan with more than 90% search performance. Meanwhile, indexing overhead of SmallClient is reduced to approximately 50 and 80% for index size and indexing time respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Vera-Baquero, A., Colomo-Palacios, R., Molloy, O.: Measuring and querying process performance in supply chains: an approach for mining big-data cloud storages. Proc. Comput. Sci. 64, 1026–1034 (2015)
Article Google Scholar
Suthaharan, S.: Big data analytics. In: Machine Learning Models and Algorithms for Big Data Classification. Integrated Series in Information Systems, vol. 36, pp. 31-75. Springer, New York (2016)
Karim, A., Salleh, R., Khan, M.K., Siddiqa, A., Choo, K.-K.R.: On the analysis and detection of mobile botnet applications. J. Univ. Comput. Sci. 22(4), 567–588 (2016)
Google Scholar
Karim, A., Shah, S.A.A., Salleh, R.B., Arif, M., Noor, R.M., Shamshirband, S.: Mobile botnet attacks an emerging threat: classification, review and open issues. KSII Trans. Internet Inform. Syst. 9(4), 1471–1492 (2015)
Google Scholar
Yaqoob, I., Chang, V., Gani, A., Mokhtar, S., Hashem, I.A.T., Ahmed, E., Anuar, N.B., Khan, S.U.: Information fusion in social big data: foundations, state-of-the-art, applications, challenges, and future research directions. Int. J. Inform. Manag. (2016)
Hashem, I.A.T., Chang, V., Anuar, N.B., Adewole, K., Yaqoob, I., Gani, A., Ahmed, E., Chiroma, H.: He role of big data in smart city. Int. J. Inform. Manag. 36(5), 748–758 (2016). doi:10.1016/j.ijinfomgt.2016.05.002
Article Google Scholar
Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. J. Parallel Distrib. Comput. 74(7), 2561–2573 (2014)
Article Google Scholar
Siddiqa, A., TargioHashem, I.A., Yaqoob, I., Marjani, M., Shamshirband, S., Gani, A., Nasaruddin, F.: A survey of big data management: taxonomy and state-of-the-art. J. Netw. Comput. Appl. 71, 151–166 (2016)
Article Google Scholar
Siddiqa, A., Karim, A., Gani, A.: Big data storage technologies: a survey. Front. Inform. Technol. Electron. Eng. 4(3), 28–33 (2016)
Google Scholar
Chang, V., Wills, G.: A model to compare cloud and non-cloud storage of big data. Future Gener. Comput. Syst. 57, 56–76 (2016)
Article Google Scholar
Lomotey, Richard K., Deters, Ralph: Unstructured data mining: use case for CouchDB. Int. J. Big Data Intell. 2(3), 168–182 (2015)
Article Google Scholar
Yu, Shanshan, Jindian, Su, Li, Pengfei, Wang, Hao: Towards high performance text mining: a TextRank-based method for automatic text summarization. Int. J. Grid High Perform. Comput. 8(2), 58–75 (2016)
Article Google Scholar
Yu, Kun-Ming, Liu, Sheng-Hui, Zhou, Li-Wei, Shu-Hao, Wu: Apriori-based high efficiency load balancing parallel data mining algorithms on multi-core architectures. Int. J. Grid High Perform. Comput. 7(2), 77–99 (2015)
Article Google Scholar
Dittrich, J., Quian, J.-A., Richter, S., Schuh, S., Jindal, A., Schad, J.: Only aggressive elephants are fast elephants. Proc. VLDB Endow. 5(11), 1591–1602 (2012)
Article Google Scholar
Idreos, S., Alagiannis, I., Johnson, R., Ailamaki, A.: Here are my Data Files. Here are my Queries. Where are my Results? In: Proceedings of 5th Biennial Conference on Innovative Data Systems Research, No. EPFL-CONF-161489 2011, vol. EPFL-CONF-161489 (2011)
Gandomi, A., Haider, M.: Beyond the hype: big data concepts, methods, and analytics. Int. J. Inform. Manag. 35(2), 137–144 (2015)
Article Google Scholar
Richter, S., Quian-Ruiz, J.-A., Schuh, S., Dittrich, J.: Towards zero-overhead adaptive indexing in Hadoop. arXiv preprint arXiv:1212.3480 (2012)
Idreos, S., Kersten, M.L., Manegold, S.: Database cracking. CIDR 3, 1–8 (2007)
Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., DeWitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pp. 165–178 (2009)
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc. VLDB Endow. 2(1), 922–933 (2009)
Jens, D., Jorge-Arnulfo, O.-R., Alekh, J.: Hadoop++: making a yellow elephant run like a cheetah. Proc. VLDB Endow. 3(1–2), 515–529 (2010)
Google Scholar
Zhuang, Y., Jiang, N., Wu, Z., Li, Q., Chiu, D.K.W., Hu, H.: Efficient and robust large medical image retrieval in mobile cloud computing environment. Inform. Sci. 263, 60–86 (2014)
Article Google Scholar
Wang, M., Holub, V., Murphy, J., O’Sullivan, P.: High volumes of event stream indexing and efficient multi-keyword searching for cloud monitoring. Future Gener. Comput. Syst. 29(8), 1943–1962 (2013)
Article Google Scholar
Kaushik, V.D., Umarani, J., Gupta, A.K., Gupta, A.K., Gupta, P.: An efficient indexing scheme for face database using modified geometric hashing. Neurocomputing 116, 208–221 (2013)
Article Google Scholar
Gani, A., Siddiqa, A., Shamshirband, S., Hanum, F.: A survey on indexing techniques for big data: taxonomy and performance evaluation. Knowl. Inf. Syst. 46(2), 241–284 (2016)
Article Google Scholar
Jin, R., Cho, H.-J., Chung, T.-S.: A group round robin based b-tree index storage scheme for flash memory devices. Paper presented at the Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication, Siem Reap, Cambodia (2014)
Chi, P., Lee, W.-C., Xie, Y.: Making B<sup>+</sup>-tree efficient in PCM-based main memory. Paper presented at the Proceedings of the 2014 international symposium on Low power electronics and design, La Jolla (2014)
McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co., Chicago (2010)
Google Scholar
Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)
Article Google Scholar
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Anthony, S., Liu, H., Wyckoff, P., Murthy, R.: Hive: a warehousing solution over a map-reduce framework. Proc. VLDB Endow. 2(2), 1626–1629 (2009)
Article Google Scholar
Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The hadoop distributed file system. In: Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on 2010, pp. 1–10 (2010)
Eldawy, A., Mokbel, M.F.: Spatial Hadoop: A MapReduce Framework for Spatial Data. In: 2015 IEEE 31st International Conference on Data Engineering 2015, pp. 1352–1363. IEEE:1352-1363 (2015)
Chang, V.: Towards a big data system disaster recovery in a private cloud. Ad Hoc Netw. 35, 65–82 (2015). doi:10.1016/j.adhoc.2015.07.012
Article Google Scholar
McCandless, M., Hatcher, E., Gospodnetic, O.: Lucene in Action: Covers Apache Lucene 3.0. Manning Publications Co., Chicago (2010)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer Science and Information Technology, University of Malaya, 50603, Kuala Lumpur, Malaysia
Aisha Siddiqa
Department of Information Technology, Bahauddin Zakariya University, Multan, 60000, Pakistan
Ahmad Karim
IBSS, Xi’an Jiaotong Liverpool University, Suzhou, 100044, China
Victor Chang

Authors

Aisha Siddiqa
View author publications
You can also search for this author in PubMed Google Scholar
Ahmad Karim
View author publications
You can also search for this author in PubMed Google Scholar
Victor Chang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aisha Siddiqa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Siddiqa, A., Karim, A. & Chang, V. SmallClient for big data: an indexing framework towards fast data retrieval. Cluster Comput 20, 1193–1208 (2017). https://doi.org/10.1007/s10586-016-0712-4

Download citation

Received: 16 September 2016
Revised: 17 November 2016
Accepted: 02 December 2016
Published: 20 December 2016
Issue Date: June 2017
DOI: https://doi.org/10.1007/s10586-016-0712-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

SmallClient for big data: an indexing framework towards fast data retrieval

Abstract

Access this article

Similar content being viewed by others

Big data preprocessing: methods and prospects

Big data analytics on Apache Spark

MongoDB Vs PostgreSQL: A comparative study on performance aspects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

SmallClient for big data: an indexing framework towards fast data retrieval

Abstract

Access this article

Similar content being viewed by others

Big data preprocessing: methods and prospects

Big data analytics on Apache Spark

MongoDB Vs PostgreSQL: A comparative study on performance aspects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation