Comparing Hadoop and Fat-Btree Based Access Method for Small File I/O Applications

Luo, Min; Yokota, Haruo

doi:10.1007/978-3-642-14246-8_20

Min Luo²⁰ &
Haruo Yokota²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6184))

Included in the following conference series:

International Conference on Web-Age Information Management

1857 Accesses
4 Citations

Abstract

Hadoop has been widely used in various clusters to build scalable and high performance distributed file systems. However, Hadoop distributed file system (HDFS) is designed for large file management. In case of small files applications, those metadata requests will flood the network and consume most of the memory in Namenode thus sharply hinders its performance. Therefore, many web applications do not benefit from clusters with centered metanode, like Hadoop. In this paper, we compare our Fat-Btree based data access method, which excludes center node in clusters, with Hadoop. We show their different performance in different file I/O applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

http://www.cloudera.com/blog/2009/02/02/the-small-files-problem
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: USENIX Symposium on Operating Systems Design and Implementation, OSDI 2004 (2004)
Google Scholar
Boral, H., Alexander, W., Clay, L., Copeland, G., Danforth, S., Franklin, M., Hart, B., Smith, M., Valduriez, P.: Prototyping Bubba, a highly parallel database system. IEEE TKDE 2(1), 4–24 (1990)
Google Scholar
Dewitt, D.J., Ghandeharizadeh, S., Schneider, D.A., Bricker, A., Hsiao, H.I., Rasmussen, R.: The Gamma database machine project. IEEE TKDE 2(1), 44–62 (1990)
Google Scholar
Yokota, H., Kanemasa, Y., Miyazaki, J.: Fat-Btree: An update conscious parallel directory structure. In: ICDE 1999, March 1999, pp. 448–457. IEEE Computer Society, Los Alamitos (1999)
Google Scholar
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Rasin, A., Silberschatz, A.: Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. In: VLDB 2009: Proceedings of the 2009 VLDB Endowment (2009)
Google Scholar
Pavlo, A., Paulson, E., Rasin, A., Abadi, D.J., Dewitt, D.J., Madden, S., Stonebraker, M.: A comparison of approaches to large-scale data analysis. In: Proceedings of the 2009 ACM SIGMOD International Conference, June 2009. ACM, New York (2009)
Google Scholar
Hadoop, http://hadoop.apache.org/
Augusto, C., Baquero, S.: Performance test of Hadoop and iRODS Distributed Storage Systems. In: Seminario De Invesigation III, Mayo 18 De (2009)
Google Scholar
Delmerico, J., Byrnes, N., Bruno, A., Jones, M., Gallo, S., Chaudhary, V.: Comparing the Performance of Clusters, Hadoop and Active Disks on Microarray Correlation Computations. In: The 16th IEEE International Conference on High Performance Computing, HiPC 2009, Cochin, India (2009)
Google Scholar
http://dev.mysql.com/doc/refman/5.1/en/overview.html
Wu, S., Kemme, B.: Postgres-R(SI): Combining Replica Control with Concurrency Control Based on Snapshot Isolation. In: Proceedings of the 21st Int’l Conf. on Data Engineering, ICDE 2005, Washington, DC, April 05 - 08, pp. 422–433 (2005)
Google Scholar
Pacitti, E., Ozsu, M.T., Coulon, C.: Preventive multi-master replication in a cluster of autonomous databases. In: Kosch, H., Böszörményi, L., Hellwagner, H. (eds.) Euro-Par 2003. LNCS, vol. 2790, pp. 318–327. Springer, Heidelberg (2003)
Google Scholar
http://slony.info/documentation/failover.html
Stonebraker, M., Abadi, D., DeWitt, D.J., Madden, S., Paulson, E., Pavlo, A., Rasin, A.: Mapreduce and parallel dbmss: friends or foes? Commun. ACM 53(1), 64–71 (2010)
Article Google Scholar
Carns, P., Lang, S., Ross, R., Vilayannur, M., Kunkel, J., Ludwig, T.: Small-File Access in Parallel File Systems. In: Proceedings of the 23rd IEEE International Parallel and Distributed Processing Symposium (April 2009)
Google Scholar
Liu, X., Han, J., et al.: Implementing WebGIS on Hadoop: A Case Study of Improving Small File IO Performance on HDFS. In: IEEE Cluster, New Orleans, LA (September 1, 2009)
Google Scholar
Leo, S., Santoni, F., Zanetti, G.: Biodoop: Bioinformatics on Hadoop. In: International Conference on Parallel Processing Workshops, ICPPW 2009, September 22-25, pp. 415–422 (2009)
Google Scholar
Seo, S., Jang, I., et al.: HPMR: Prefetching and pre-shuffling in shared MapReduce computation environment. In: Cluster Computing and Workshops (2009)
Google Scholar
Namiki, Y., Kanbe, K., Kobayashi, D., Yokota, H.: An approach of using a parallel B-tree structure, Fat-Btree, in PostgreSQL for distributed retrieval. DBSJ Letters 6(2), 61–64 (2007)
Google Scholar
Chaiken, R., Jenkins, B., Larson, P.-A., Ramsey, B., Shakib, D., Weaver, S., Zhou, J.: Scope: Easy and efficient parallel processing of massive data sets. In: Proc. of International Conference on Very Large Data Bases (2008)
Google Scholar
Olston, C., Reed, B., Srivastava, U., Kumar, R., Tomkins, A.: Pig latin: a not-so-foreign language for data processing. In: Proc. of SIGMOD (2008)
Google Scholar
Facebook. Hive. Web page, http://issues.apache.org/jira/browse/HADOOP-3601
Wang, F., Qiu, J., Yang, J., Dong, B., Li, X., Li, Y.: Hadoop high availability through metadata replication. In: CloudDB 2009: Proceeding of the First International Workshop on Cloud Data Management (2009)
Google Scholar
Thusoo, A., et al.: Hive: A warehousing solution over a Map-Reduce framework. In: Proceedings of the Conference on Very Large Databases, pp. 1626–1629 (2009)
Google Scholar
Amazon Web Service LLC. Amazon Elastic MapReduce (2009), http://aws.amazon.com/elasticmapreduce/

Download references

Author information

Authors and Affiliations

Department of Computer Science, Tokyo Institute of Technology, 2–12–1 Ookayama, Meguro-ku, Tokyo, 152–8552, Japan
Min Luo
Global Scientific Information and Computing Center, Tokyo Instititute of, Technology, 2–12–1 Ookayama, Meguro-ku, Tokyo, 152–8550, Japan
Haruo Yokota

Authors

Min Luo
View author publications
You can also search for this author in PubMed Google Scholar
Haruo Yokota
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
Lei Chen
Computer Department, Sichuan University, 610064, Chengdu, China
Changjie Tang
Department of Computer Science, Duke University, Box 90129, NC 27708-0129, Durham, USA
Jun Yang
College of Computer Science, Zhejiang University, 388 Yuhangtang Road, 310058, Hangzhou, China
Yunjun Gao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Luo, M., Yokota, H. (2010). Comparing Hadoop and Fat-Btree Based Access Method for Small File I/O Applications. In: Chen, L., Tang, C., Yang, J., Gao, Y. (eds) Web-Age Information Management. WAIM 2010. Lecture Notes in Computer Science, vol 6184. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14246-8_20

Download citation

DOI: https://doi.org/10.1007/978-3-642-14246-8_20
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14245-1
Online ISBN: 978-3-642-14246-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics