Skip to main content
Log in

Random Query Answering with the Crowd

  • Original Article
  • Published:
Journal on Data Semantics

Abstract

Random data generators play an important role in computer science and engineering since they aim at simulating reality in IT systems. Software random data generators cannot be reliable enough for critical applications due to their intrinsic determinism, while hardware random data generators are difficult to integrate within applications and are not always affordable in all circumstances. We present an approach that makes use of entropic data sources to compute the random data generation task. In particular, our approach exploits the chaotic phenomena happening in the crowd. We extract these phenomena from social networks since they reflect the behavior of the crowd. We have implemented the approach in a database system, RandomDB, to show its efficiency and its flexibility over the competitor approaches. We used RandomDB by taking data from Twitter, Facebook and Flickr. The experiments show that these social networks are sources to generate reliable randomness and RandomDB a system that can be used for the task. Hopefully, our experience will drive the development of a series of applications that reuse the same data in several and different scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. Completely Automated Public Turing test to tell Computers and Humans Apart.

  2. http://duolingo.com/.

  3. http://riwi.com/.

  4. http://www.mturk.com/.

  5. http://www.clickworker.com/.

  6. https://microworkers.com/.

  7. http://rapidworkers.com/.

  8. http://www.samasource.org/.

  9. http://microtask.com/.

  10. http://www.facebook.com/.

  11. http://www.twitter.com/.

  12. http://www.flickr.com/.

  13. http://technet.microsoft.com/en-us/library/cc962093.aspx.

  14. http://www.random.org/.

  15. http://sourceforge.net/projects/lavarnd/.

  16. http://www.fourmilab.ch/hotbits/.

  17. http://software.intel.com/.

  18. http://www.araneus.fi/products-alea-eng.html.

  19. http://www.protego.se/.

  20. http://www.letech.jpn.com/rng/index_rng_e.html.

  21. http://tectrolabs.com/.

  22. http://www.trng98.se/.

  23. http://comscire.com/.

  24. http://quintessencelabs.com/.

  25. http://www.idquantique.com/.

  26. http://www.flickr.com/services/api/.

  27. https://dev.twitter.com/.

  28. http://developers.facebook.com/.

  29. https://developer.linkedin.com/apis.

  30. http://www.stat.fsu.edu/pub/diehard/.

  31. http://www.phy.duke.edu/~rgb/General/dieharder.php.

  32. http://www.random.org/.

  33. In mathematics, a real-valued function f(x) defined on an interval is called convex if the line segment between any two points on the graph of the function lies above the graph, in a Euclidean space (or more generally a vector space) of at least two dimensions.

References

  1. Aggarwal CC (2013) On the analytical properties of high-dimensional randomization. TKDE 25(7):1628–1642

    MathSciNet  Google Scholar 

  2. Akram RN, Markantonakis K, Mayes K (2012) Pseudorandom number generation in smart cards: an implementation, performance and randomness analysis. In: NTMS, pp 1–7

  3. Alimomeni M, Safavi-Naini R, Sharifian S (2013) A true random generator using human gameplay. In: GameSec, pp 10–28

  4. Arnopoulos P (1994) Sociophysics: chaos and cosmos in nature and culture. Nova Science, New York

    Google Scholar 

  5. Bassham III, LE, Rukhin AL, Soto J, Nechvatal JR, Smid ME, Barker EB, Leigh SD, Levenson M, Vangel M, Banks DL, Heckert NA, Dray JF, Vo S (2010) A statistical test suite for random and pseudorandom number generators for cryptographic applications. SP 800-22 Rev 1a

  6. Blum L, Blum M, Shub M (1986) A simple unpredictable pseudo-random number generator. SIAM J Comput 15(2):364–383

    Article  MathSciNet  MATH  Google Scholar 

  7. Boyar J (1989) Inferring sequences produced by pseudo-random number generators. J ACM 36(1):129–141

    Article  MathSciNet  MATH  Google Scholar 

  8. Bozzon A, Brambilla M, Ceri S (2012) Answering search queries with crowdsearcher. In: WWW, pp 1009–1018

  9. Chen J (2005) The physical foundation of economics: an analytical thermodynamic theory. World Scientific Publishing Company, Singapore

    Book  Google Scholar 

  10. Chen J, Miyaji A, Su C (2014) Distributed pseudo-random number generation and its application to cloud database. In: ISPEC, pp 373–387

  11. Crescenzi V, Merialdo P, Qiu D (2013) A framework for learning web wrappers from the crowd. In: WWW, pp 261–272

  12. Cuzzocrea A, Darmont J, Mahboubi H (2009) Fragmenting very large xml data warehouses via k-means clustering algorithm. IJBIDM 4(3/4):301–328

    Article  Google Scholar 

  13. Cuzzocrea A, Sacc D, Ullman JD (2013) Big data: a research agenda. In: IDEAS, pp 198–203

  14. De Virgilio R, Maccioni A (2013) Generation of reliable randomness via social phenomena. In: MEDI, pp 65–77

  15. Demartini G, Trushkowsky B, Kraska T, Franklin MJ (2013) CrowdQ: crowdsourced query understanding. In: CIDR

  16. Dorrendorf L, Gutterman Z, Pinkas B (2009) Cryptanalysis of the random number generator of the windows operating system. ACM Trans Inf Syst Secur 13(1):10

  17. Figurska M, Stańczyk M, Kulesza K (2008) Humans cannot consciously generate random numbers sequences: polemic study. Med Hypotheses 70(1):182–5

    Article  Google Scholar 

  18. Franklin MJ, Kossmann D, Kraska T, Ramesh S, Xin R (2011) CrowdDB: answering queries with crowdsourcing. In: SIGMOD, pp 61–72

  19. Galam S (2012) Sociophysics: a physicist’s modeling of psycho-political phenomena. Springer, Berlin

    Book  Google Scholar 

  20. Gearheart CM, Arazi B, Rouchka EC (2010) DNA-based random number generation in security circuitry. Biosystems 100(3):208–214

    Article  Google Scholar 

  21. Gerguri S, Matyás Jr V, Ríha Z, Smolík L (2010) Random number generation based on fingerprints. In: WISTP, pp 170–182

  22. Gregersen H, Sailer L (1993) Chaos theory and its implications for social science research. Hum Relat 46(7):777–802

    Article  Google Scholar 

  23. Gutterman Z, Pinkas B, Reinman T (2006) Analysis of the linux random number generator. In: IEEE symposium on security and privacy, pp 371–385

  24. Haje FE, Golubev Y, Liardet PY, Teglia Y (2006) On statistical testing of random numbers generators. In: SCN, pp 271–287

  25. Halprin R, Naor M (2009) Games for extracting randomness. In: SOUPS

  26. Kanter I, Aviad Y, Reidler I, Cohen E, Rosenbluh M (2010) An optical ultrafast random bit generator. Nat Photonics 4(1):58–61

    Article  Google Scholar 

  27. Knuth DE (1981) The art of computer programming. Seminumerical algorithms, vol II, 2nd edn. Addison-Wesley, Reading

  28. Krhovjak J, Matyas V, Zizkovsky J (2009) Generating random and pseudorandom sequences in mobile devices. In: MobiSec, pp 122–133

  29. La Cerra P (2003) The first law of psychology is the second law of thermodynamics: the energetic evolutionary model of the mind and the generation of human psychological phenomena. Hum Nat Rev 3:440–447

    Google Scholar 

  30. L’Ecuyer P (2001) Software for uniform random number generation: distinguishing the good and the bad. In: Winter simulation conference, pp 95–105

  31. L’Ecuyer P, Simard RJ (2007) TestU01: a C library for empirical testing of random number generators. ACM Trans Math Softw 33(4):22

  32. Leung CKS, Cuzzocrea A, Jiang F (2013) Discovering frequent patterns from uncertain data streams with time-fading and landmark models. T. Large Scale Data Knowl Cent Syst 8:174–196

    Google Scholar 

  33. Mannila H (2009) Randomization methods in data mining. In: KDD, pp 5–6

  34. Marcus A, Wu E, Karger D, Madden S, Miller R (2011) Human-powered sorts and joins. PVLDB 5(1):13–24

    Google Scholar 

  35. Marsaglia G (2003) Random number generation. In: Encyclopedia of computer science. Wiley, Chichester, pp 1499–1503

  36. Marsaglia G (2003) Seeds for random number generators. Commun ACM 46(5):90–93

    Article  Google Scholar 

  37. Marsaglia G (2003) Xorshift RNGs. J Stat Softw 8(14):1–6

  38. Matsumoto M, Nishimura T (1998) Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator. ACM Trans Model Comput Simul 8(1):3–30

    Article  MATH  Google Scholar 

  39. Maurer UM (1992) A universal statistical test for random bit generators. J Cryptol 5(2):89–105

    Article  MATH  Google Scholar 

  40. de Melo POSV, Viana AC, Fiore M, Jaffrès-Runser K, Mouël FL, Loureiro AAF, Addepalli L, Chen G (2015) RECAST: telling apart social and random relationships in dynamic networks. Perform Eval 87:19–36

    Article  Google Scholar 

  41. Nisan N (1996) Extracting randomness: how and why a survey. In: CCC, pp 44–58

  42. Nobari S, Lu X, Karras P, Bressan S (2011) Fast random graph generation. In: EDBT, pp 331–342

  43. Panneton F, L’ecuyer P, Matsumoto M(2006) Improved long-period generators based on linear recurrences modulo 2. TOMS 32(1):1–16

  44. Park SK, Miller KW (1988) Random number generators: good ones are hard to find. Commun ACM 31(10):1192–1201

    Article  MathSciNet  Google Scholar 

  45. Perony N, Tessone C, König B, Schweitzer F et al (2012) How random is social behaviour? Disentangling social complexity through the study of a wild house mouse population. PLoS Comput Biol 8(11):e1002786–e1002786

    Article  Google Scholar 

  46. de Raadt T, Hallqvist N, Grabowski A, Keromytis AD, Provos N (1999) Cryptography in OpenBSD: an overview. In: USENIX annual technical conference, pp 93–101

  47. Rapoport A, Budescu DV (1997) Randomization in individual choice behavior. Psychol Rev 104(3):603–617

    Article  Google Scholar 

  48. Saito T, Ishii K, Tatsuno I, Sukagawa S, Yanagita T (2010) Randomness and genuine random number generator with self-testing functions. In: Joint international conference on supercomputing in nuclear applications and Monte Carlo

  49. Selke J, Lofi C, Balke WT (2012) Pushing the boundaries of crowd-enabled databases with query-driven schema expansion. PVLDB 5(6):538–549

    Google Scholar 

  50. Stoer J, Bulirsch R, Bartels RH, Gautschi W, Witzgall C (2002) Introduction to numerical analysis. Springer, Berlin

    Book  MATH  Google Scholar 

  51. Trevisan L (2001) Extractors and pseudorandom generators. J ACM 48(4):860–879

    Article  MathSciNet  MATH  Google Scholar 

  52. Von Ahn L, Maurer B, McMillen C, Abraham D, Blum M (2008) reCAPTCHA: human-based character recognition via web security measures. Science 321(5895):1465–1468

    Article  MathSciNet  MATH  Google Scholar 

  53. Wagenaar WA (1972) Generation of random sequences by human subjects: a critical survey of the literature. Psychol Bull 77(1):65–72

  54. Wang J, Kraska T, Franklin MJ, Feng J (2012) Crowder: crowdsourcing entity resolution. PVLDB 5(11):1483–1494

    Google Scholar 

  55. Yilek S, Rescorla E, Shacham H, Enright B, Savage S (2009) When private keys are public: results from the 2008 Debian OpenSSL vulnerability. In: IMC, pp 15–27

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roberto De Virgilio.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

De Virgilio, R., Maccioni, A. Random Query Answering with the Crowd. J Data Semant 5, 3–17 (2016). https://doi.org/10.1007/s13740-015-0051-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13740-015-0051-2

Keywords

Navigation