Computing

, Volume 98, Issue 11, pp 1137–1151 | Cite as

Storing shared documents that are customized by users in cloud computing

Article

Abstract

The present research aims at introducing a new efficient approach to store shared text documents, which are individually replicated and customized by users, in a cloud server. The approach is applicable to all text file formats supported by any existing application. We assume a text document consists of both text and images. Instead of replicating text of the original document, two tiny intermediate files for each user are generated and stored. Formatting data and comments are produced and saved in a spreadsheet document and a new file format “.comment” respectively. The first file is a spreadsheet document that contains metadata about customized information including highlighted and underlined (and other formatting styles like strikethrough, bold and italic) parts of the document. The other file aims to store comments provided by user in a “.comment” file format. Furthermore, the approach addresses efficiently storage of annotated images inside the document. For this, an algorithm is proposed to store the difference between the annotated and the original images. Whenever a user wants to access his/her customized document, the intermediate files are attached to the original document and appeared to him/her. Finally, we evaluate the proposed approach through a real scenario. The experimental results show a large amount of disk space are saved.

Keywords

Cloud computing DaaS Storage 

Mathematics Subject Classification

68-06 68M14 68P20 

References

  1. 1.
    Abouzeid A, Bajda-Pawlikowski K, Abadi D, Silberschatz A, Rasin A (2009) HadoopDB: an architectural hybrid of MapReduce and DBMS technologies for analytical workloads. Proc VLDB Endow 02(01):922–933CrossRefGoogle Scholar
  2. 2.
    Agrawal D, Das S, El Abbadi A (2011) Big data and cloud computing: current state and future opportunities. In: 14th International Conference on Extending Database Technology, UppsalaGoogle Scholar
  3. 3.
    Apache Software Foundation (2011) Welcome to Apache (TM) Hadoop, Apache Software Foundation, 27 12 2011. http://hadoop.apache.org/. Accessed 18 Dec 2014
  4. 4.
    Bahrami M, Singhal M (2015) The role of cloud computing architecture in big data, in information granularity, big data, and computational intelligence. Springer International Publishing, MercedGoogle Scholar
  5. 5.
    Balasubramanian B, Lan T, Chiang M (2014) SAP: similarity-aware partitioning for efficient cloud storage. In: IEEE Conference on Computer Communications (INFOCOM 2014), TorontoGoogle Scholar
  6. 6.
    Beloglazov A, Abawajy J, Buyya R (2012) Energy-aware resource allocation heuristics for efficient management of data centers for Cloud computing. Future Gener Comput Syst 28(05):755–768CrossRefGoogle Scholar
  7. 7.
    Berl A, Gelenbe E, Di Girolamo M, Giuliani G, De Meer H, Dang MQ, Pentikousis K (2009) Energy-efficient cloud computing. Comput J 53(07):1045–1051CrossRefGoogle Scholar
  8. 8.
    Bonvin N (2010) A self-organized, fault-tolerant and scalable replication scheme for cloud storage. In: 1st ACM symposium on Cloud computing, IndianapolisGoogle Scholar
  9. 9.
    Bu Y, Howe B, Balazinska M, Ernst MD (2010) HaLoop: efficient iterative data processing on large clusters. Proc VLDB Endow 03(01):285–296CrossRefGoogle Scholar
  10. 10.
    Buscema M, Grossi E, Snowdon D, Antuono P (2008) Auto-contractive maps: an artificial adaptive system for data mining. An application to Alzheimer disease. Curr Alzheimer Res 05(05):481–498CrossRefGoogle Scholar
  11. 11.
    Buyya R, Beloglazov A, Abawajy J (2010) Energy-efficient management of data center resources for cloud computing: a vision, architectural elements, and open challenges. In: International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA 2010), Las VegasGoogle Scholar
  12. 12.
    Chang F, Dean J, Ghemawat S, Hsieh WC, Wallach DA, Burrows M, Chandra T, Fikes A, R. E (2008) Gruber, bigtable: a distributed storage system for structured data. ACM Trans Comput Syst (TOCS) 26(2)Google Scholar
  13. 13.
    Cooper BF, Ramakrishnan R, Srivastava U, Silberstein A, Bohannon P, Jacobsen H-A, Puz N, Weaver D, Yerneni R (2008) PNUTS: Yahoo!’s hosted data serving platform. Proc VLDB Endow 01(02):1277–1288CrossRefGoogle Scholar
  14. 14.
    Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(01):107–113CrossRefGoogle Scholar
  15. 15.
    DeCandia G, Hastorun D, Jampani M, Kakulapati G, Lakshman A, Pilchin A, Sivasubramanian S, Vosshall P, Vogels W (2007) Dynamo: amazon’s highly available key-value store. In: 21st ACM SIGOPS symposium on Operating systems principles, StevensonGoogle Scholar
  16. 16.
    Dong B, Qiu J, Zheng Q, Zhong X, Li J, Li Y (2010) A novel approach to improving the efficiency of storing and accessing small files on Hadoop: a case study by PowerPoint Files. In: IEEE International Conference on Services Computing, MiamiGoogle Scholar
  17. 17.
    Hanna M (2004) Data mining in the e-learning domain. Campus-Wide Inf Syst 21(01):29–34CrossRefGoogle Scholar
  18. 18.
    Howe D, Costanzo M, Fey P, Gojobori T, Hannick L, Hide W, Hill DP, Kania R, Schaeffer M, Pierre SS, Twigger S, White O, Rhee SY (2008) Big data: the future of biocuration. Nature 455:47–50CrossRefGoogle Scholar
  19. 19.
    Isard M, Budiu M, Yu Y, Birrell A, Fetterly D (2007) Dryad: distributed data-parallel programs from sequential building blocks. In: 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems, LisbonGoogle Scholar
  20. 20.
    Jiang L, Li B, Song M (2010) The optimization of HDFS based on small files. In: 3rd IEEE International Conference on Broadband Network and Multimedia Technology (IC-BNMT), BeijingGoogle Scholar
  21. 21.
    Jing H, Kai W, Lok Kei L, Seungbeom M, Melody M (2013) A tunable workflow scheduling algorithm based on particle swarm optimization for cloud computing. Int J Soft Comput Softw Eng (JSCSE) 3(3):351–358Google Scholar
  22. 22.
    Mackey G, Sehrish S, Wang J (2009) Improving metadata management for small files in HDFS. In: IEEE International Conference on Cluster Computing and Workshops (CLUSTER ’09), New OrleansGoogle Scholar
  23. 23.
    Manyika J, Chui M, Brown B, Bughin J, Dobbs R, Roxburgh C, Hung Byers A (2011) Big data: the next frontier for innovation, competition, and productivity. McKinsey Global InstituteGoogle Scholar
  24. 24.
    Microsoft (2014) Microsoft SQL Azure, Microsoft. http://azure.microsoft.com/en-us/. Accessed 16 Dec 2014
  25. 25.
    ParsPack Cloud Computing Provider (2010) Cloud Server, ParsPack Cloud Computing Provider, 12 12 2010. http://www.parspack.com. Accessed 24 Mar 2016
  26. 26.
    Sangeetha K, Prakash P (2015) Big data and cloud: a survey. In: Artificial Intelligence and Evolutionary Algorithms in Engineering Systems (ICAEES 2014)Google Scholar
  27. 27.
    Shirvani H, Vahdat-Nejad H (2014) A new efficient approach to store data in a cloud server. In: 3rd International Conference on Context-Aware Systems and Applications, DubaiGoogle Scholar
  28. 28.
    Tan W, Blake MB, Saleh I, Dustdar S (2013) Social-network-sourced big data analytics. IEEE Internet Comput 17(05):62–69CrossRefGoogle Scholar
  29. 29.
    Tantisiriroj W, Patil S, Gibson G (2008) Data-intensive file systems for Internet services: a rose by any other name..., Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-08-114, PittsburghGoogle Scholar
  30. 30.
    Vecchiola C, Pandey S, Buyya R (2009) High-performance cloud computing: a view of scientific applications. In: 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN), KaohsiungGoogle Scholar
  31. 31.
    Vellante D (2014) Revisited: the rapid growth in unstructured data, Wikibon, 17 08 2010. http://wikibon.org/blog/unstructured-data/. Accessed 05 Dec 2014
  32. 32.
    Voorsluys W, Broberg J, Buyya R (2011) Introduction to cloud computing. In: Cloud computing: principles and paradigms. Weily, pp 3-41Google Scholar
  33. 33.
    Wikipedia (2006) Google Docs, Google, 06 06 2006. http://www.google.com/docs/about/. Accessed 05 Jan 2015
  34. 34.
    Wikipedia (2006) Google Docs, Google, 09 2006. https://docs.google.com/. Accessed 30 Mar 2016
  35. 35.
    Wikipedia (2007) Microsoft OneDrive, Microsoft, 01 08 2007. https://onedrive.live.com/. Accessed 10 Dec 2014
  36. 36.
    Wikipedia (2011) Apple—iCloud, Apple, 12 10 2011. https://www.apple.com/icloud/. Accessed 10 Dec 2014
  37. 37.
    Wikipedia (2012) Google Drive, Google, 24 04 2012. https://www.google.com/drive/. Accessed 10 Dec 2014
  38. 38.
    Wikipedia (2014) Dropbox, Dropbox, 09 2008. https://www.dropbox.com/. Accessed 10 Dec 2014
  39. 39.
    Wikipedia (2015) Cloud computing, Wikipedia, 11 02 2015. http://en.wikipedia.org/wiki/Cloud_computing. Accessed 13 Feb 2015
  40. 40.
    Wikipedia (2016) Version Control, 06 04 2016. https://en.wikipedia.org/wiki/Version_control. Accessed 06 Apr 2016
  41. 41.
    Wilson LA (2013) Survey on big data gathers input from materials community. MRS Bull 38(09):751–753CrossRefGoogle Scholar
  42. 42.
    Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl 01(01):7–18CrossRefGoogle Scholar
  43. 43.
    Zhang Y, Liu D (2012) Improving the efficiency of storing for small files in HDFS. In: International Conference on Computer Science and Service System, NanjingGoogle Scholar
  44. 44.
    Zhang X, Xu F (2013) Survey of research on big data storage. In: 12th International Symposium on Distributed Computing and Applications to Business. Engineering and Science, Kingston upon ThamesGoogle Scholar

Copyright information

© Springer-Verlag Wien 2016

Authors and Affiliations

  1. 1.Pervasive and Cloud Computing Laboratory, Faculty of Electrical and Computer EngineeringUniversity of BirjandBirjandIran

Personalised recommendations