Chapter

Performance Characterization and Benchmarking. Traditional to Big Data

Volume 8904 of the series Lecture Notes in Computer Science pp 188-203

Date:

Composite Key Generation on a Shared-Nothing Architecture

  • Marie HoffmannAffiliated withDIMA, Technische Universität Berlin Email author 
  • , Alexander AlexandrovAffiliated withDIMA, Technische Universität Berlin
  • , Periklis AndritsosAffiliated withInstitut des Systèmes d’Information, Université de Lausanne
  • , Juan SotoAffiliated withDIMA, Technische Universität Berlin
  • , Volker MarklAffiliated withDIMA, Technische Universität Berlin

* Final gross prices may vary according to local VAT.

Get Access

Abstract

Generating synthetic data sets is integral to benchmarking, debugging, and simulating future scenarios. As data sets become larger, real data characteristics thereby become necessary for the success of new algorithms. Recently introduced software systems allow for synthetic data generation that is truly parallel. These systems use fast pseudorandom number generators and can handle complex schemas and uniqueness constraints on single attributes. Uniqueness is essential for forming keys, which identify single entries in a database instance. The uniqueness property is usually guaranteed by sampling from a uniform distribution and adjusting the sample size to the output size of the table such that there are no collisions. However, when it comes to real composite keys, where only the combination of the key attribute has the uniqueness property, a different strategy needs to be employed. In this paper, we present a novel approach on how to generate composite keys within a parallel data generation framework. We compute a joint probability distribution that incorporates the distributions of the key attributes and use the unique sequence positions of entries to address distinct values in the key domain.