Optimized Data Placement for Column-Oriented Data Store in the Distributed Environment

  • Minqi Zhou
  • Chen Xu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6637)


Column-oriented data storage becomes a buzzword nowadays for its high efficiency in massive data access, high compression ratio on individual columns and etc. However, the initial observations turn out to not be trivially true. The seek time and bandwidth of current hard disk drivers (HDD) become the bottleneck for massive data processing day by day, when comparing to other component enhancements of computers during the past four decades. In this paper, we provide a novel data placement strategy for massive data analysis (i.e., read-optimized) based on Gray Code, which enhances the ratio of sequential access to a great extent for diverse query evaluations (e.g., range query, partial match range query, aggregation query and etc). A centralized/distributed structured index is employed in the popularly deployed distributed file systems (e.g., GFS), which achieves the convenient management, efficient accessibility, high extendibility and etc. Detailed theoretical analysis on index extendibility, sequential access improvement and storage capacity usage in terms of proposed data placement strategies are provided as well as specific algorithms. Our extensive experimental studies confirm the efficiency and effectiveness of our proposed data placement methods.


Range Query Gray Code Data Placement Index Code Aggregation Query 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boncz, P., Zukowski, M., Nes, N.: MonetDB/X100: Hyper-pipelining query execution. In: Proceeding of CIDR 2005 (2005)Google Scholar
  2. 2.
    Vertica, “Vertica” (2008),
  3. 3.
    Olofson, A.W.: IDC Excerpt Worldwide Database Management System 2009-2013 Forecast and, Vendor Shares. Technical Report 219232E (October 2008)Google Scholar
  4. 4.
    Gray, J.: A Conversation with Jim Gray. ACM Queue 1(4) (2003)Google Scholar
  5. 5.
    Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: Proceedings of SIGOPS 2003, pp. 29–43 (2003)Google Scholar
  6. 6.
    Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. In: Proceedings of OSDI 2004 (2004)Google Scholar
  7. 7.
    Yahoo!, “Hadoop Distributed File System” (2008),
  8. 8.
    Gray, F.: Pulse code communications. U.S. Patent 2632058 (1953)Google Scholar
  9. 9.
    Howard, J., et al.: An overview of the andrew file system. In: Proceedings of the USENIX 1988, pp. 23–26 (1988)Google Scholar
  10. 10.
    Kistler, J., Satyanarayanan, M.: Disconnected operation in the Coda file system. ACM Transactions on Computer Systems 10(1), 25 (1992)CrossRefGoogle Scholar
  11. 11.
    Nelson, M., Welch, B., Ousterhout, J.: Caching in the Sprite network file system. ACM Transactions on Computer Systems 6(1), 134–154 (1988)CrossRefGoogle Scholar
  12. 12.
    Copeland, G., Khoshafian, S.: A decomposition storage model. In: Proceedings SIGMOD 1985, pp. 268–279 (1985)Google Scholar
  13. 13.
    Stonebraker, M., Abadi, D., Batkin, A., Chen, X., Cherniack, M., Ferreira, M., Lau, E., Lin, A., Madden, S., O’Neil, E., et al.: C-store: a column-oriented DBMS. In: Proceedings VLDB 2005, pp. 564–275 (2005)Google Scholar
  14. 14.
    Chang, F., Dean, J., Ghemawat, S., Hsieh, W., Wallach, D., Burrows, M., Chandra, T., Fikes, A., Gruber, R.: Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems 26(2), 4 (2008)CrossRefGoogle Scholar
  15. 15.
    Yahoo!, “HBase” (2008),

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Minqi Zhou
    • 1
  • Chen Xu
    • 1
  1. 1.Massive Computing InstituteEast China Normal UniversityShanghaiChina

Personalised recommendations