Storage Size Estimation for Schemaless Big Data Applications: A JSON-Based Overview

Swami, Devang; Sahoo, Bibhudatta

doi:10.1007/978-981-10-5523-2_29

Devang Swami⁶ &
Bibhudatta Sahoo⁶

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 19))

1209 Accesses
1 Citations

Abstract

Numerous technologies have been proposed for storing big data on the Cloud platform. However, choice of these technologies is always application specific. Determining a strong model is a perplexing task which makes it necessary for the architects and designers to review the requirements and choose a solution. This paper presents 14 data models available in the market to choose from. Above all, there are more than 45 database solutions available in the market, which can be categorized into one of the data models each of which is applicable to its own set of use cases (However, there are few products which could not be categorized into any of these 14 data models). Contributors have figured out that while storing schemaless information, the size of data stored in the database is higher than the original size. Metadata information and physical schema are the two responsible factors for such a high amount of storage requirement. Mathematical models and experimental evaluations conducted show that MongoDB requires storage space many times more than the original size of data. A storage space estimation equation for JSON-based solutions has been suggested, which can compare the storage requirement size using space required by CSV as a base. This may be used to decide an approximate amount of storage space required by the application, before buying a storage space in the Cloud environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An Efficient and Performance-Aware Big Data Storage System

Big data storage technologies: a survey

Article 01 August 2017

Comparative Analysis of Object-Based Big Data Storage Systems on Architectures and Services: A Recent Survey

Article 08 February 2024

Notes

1.
MessagePack is a JSON-like but comparitively smaller in size [22].
2.
We use the term amortize because we donot consider the size of putting other characters like comma, carriage return, space for null values, and other special characters.
3.
We are not including comma, other special characters, and null values since we are only after a rough estimate.

References

Whitehouse, O.: Fea consolidated reference model document (2005)
Google Scholar
Codd, E.F.: A relational model of data for large shared data banks. Communications of the ACM 13(6), 377–387 (1970)
Article MATH Google Scholar
Gartner.com: Gartner report
Google Scholar
Gibson, G.A., Vitter, J.S., Wilkes, J.: Strategic directions in storage i/o issues in large-scale computing. ACM Computing Surveys (CSUR) 28(4), 779–793 (1996)
Article Google Scholar
Stonebraker, M., Hellerstein, J.: What goes around comes around. Readings in Database Systems 4 (2005)
Google Scholar
Rabl, T., Gómez-Villamor, S., Sadoghi, M., Muntés-Mulero, V., Jacobsen, H.A., Mankovskii, S.: Solving big data challenges for enterprise application performance management. Proceedings of the VLDB Endowment 5(12), 1724–1735 (2012)
Article Google Scholar
Demirkan, H., Delen, D.: Leveraging the capabilities of service-oriented decision support systems: Putting analytics and big data in cloud. Decision Support Systems 55(1), 412–421 (2013)
Article Google Scholar
Chen, C.P., Zhang, C.Y.: Data-intensive applications, challenges, techniques and technologies: A survey on big data. Information Sciences 275, 314–347 (2014)
Article Google Scholar
Kambatla, K., Kollias, G., Kumar, V., Grama, A.: Trends in big data analytics. Journal of Parallel and Distributed Computing 74(7), 2561–2573 (2014)
Article Google Scholar
Ndbcluster size requirement estimator. https://dev.mysql.com/doc/refman/5.7/en/mysql-cluster-programs-ndb-size-pl.html, accessed: 2016-09-30
Hardware sizing calculator. https://neo4j.com/hardware-sizing/, accessed: 2016-09-30
Padhy, R.P., Patra, M.R., Satapathy, S.C.: Rdbms to nosql: reviewing some next-generation non-relational databases. International Journal of Advanced Engineering Science and Technologies 11(1), 15–30 (2011)
Google Scholar
Aho, A.V., Sethi, R., Ullman, J.D.: Compilers, Principles, Techniques. Addison wesley Boston (1986)
Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD international conference on Management of data. pp. 1247–1250. AcM (2008)
Google Scholar
Consortium, W.W.W., et al.: Json-ld 1.0: a json-based serialization for linked data (2014)
Google Scholar
Finn, R.D., Mistry, J., Tate, J., Coggill, P., Heger, A., Pollington, J.E., Gavin, O.L., Gunasekaran, P., Ceric, G., Forslund, K., et al.: The pfam protein families database. Nucleic acids research p. gkp985 (2009)
Google Scholar
del Alba, L.: Data serialization comparison: Json, yaml, bson, messagepack. https://www.sitepoint.com/data-serialization-comparison-json-yaml-bson-messagepack/, accessed: 2016-09-26
Cook, K.B., Kazan, H., Zuberi, K., Morris, Q., Hughes, T.R.: Rbpdb: a database of rna-binding specificities. Nucleic acids research 39(suppl 1), D301–D308 (2011)
Article Google Scholar
Cranford, K.: How to excel with sas. In: Proceedings of the 28 th Annual SCSUG Conference, Austin, Texas, September (2007)
Google Scholar
Shafranovich, Y.: Common format and mime type for comma-separated values (csv) files (2005)
Google Scholar
Sharma, T.C., Jain, M.: Weka approach for comparative study of classification algorithm. International Journal of Advanced Research in Computer and Communication Engineering 2(4), 1925–1931 (2013)
Google Scholar
Messagepack. http://msgpack.org/index.html, accessed: 2016-09-26
Commission, N.T..L.: Tlc yellow taxi trip record data. http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml, accessed: 2016-09-30
DB-engines.com: Dbms rankings 2017 (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

National Institute of Technology Rourkela, Rourkela, 769008, Odisha, India
Devang Swami & Bibhudatta Sahoo

Authors

Devang Swami
View author publications
You can also search for this author in PubMed Google Scholar
Bibhudatta Sahoo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Devang Swami .

Editor information

Editors and Affiliations

Department of Computer Science and Information Management, Providence University, Taichung City, Taiwan
Yu-Chen Hu
CSED, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Shailesh Tiwari
Department of Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Allahabad, Uttar Pradesh, India
Krishn K. Mishra
Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, India
Munesh C. Trivedi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Swami, D., Sahoo, B. (2018). Storage Size Estimation for Schemaless Big Data Applications: A JSON-Based Overview. In: Hu, YC., Tiwari, S., Mishra, K., Trivedi, M. (eds) Intelligent Communication and Computational Technologies. Lecture Notes in Networks and Systems, vol 19. Springer, Singapore. https://doi.org/10.1007/978-981-10-5523-2_29

Download citation

DOI: https://doi.org/10.1007/978-981-10-5523-2_29
Published: 24 October 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5522-5
Online ISBN: 978-981-10-5523-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Storage Size Estimation for Schemaless Big Data Applications: A JSON-Based Overview

Abstract

Access this chapter

Similar content being viewed by others

An Efficient and Performance-Aware Big Data Storage System

Big data storage technologies: a survey

Comparative Analysis of Object-Based Big Data Storage Systems on Architectures and Services: A Recent Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Storage Size Estimation for Schemaless Big Data Applications: A JSON-Based Overview

Abstract

Access this chapter

Similar content being viewed by others

An Efficient and Performance-Aware Big Data Storage System

Big data storage technologies: a survey

Comparative Analysis of Object-Based Big Data Storage Systems on Architectures and Services: A Recent Survey

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation