Skip to main content
Log in

GenD An Evolutionary System for Resampling in Survey Research

  • Published:
Quality and Quantity Aims and scope Submit manuscript

Abstract

The paper is a preliminary research report and presents a method for generating new records using an evolutionary algorithm (close to but different from a genetic algorithm). This method, called Pseudo-Inverse Function (in short P-I Function), was designed and implemented at Semeion Research Centre (Rome). P-I Function is a method to generate new (virtual) data from a small set of observed data. P-I Function can be of aid when budget constraints limit the number of interviewees, or in case of a population that shows some sociologically interesting trait, but whose small size can seriously affect the reliability of estimates, or in case of secondary analysis on small samples.

The applicative ground is given by research design with one or more dependent and a set of independent variables. The estimation of new cases takes place according to the maximization of a fitness function and outcomes a number as large as needed of ‘virtual’ cases, which reproduce the statistical traits of the original population. The algorithm used by P-I Function is known as Genetic Doping Algorithm (GenD), designed and implemented by Semeion Research Centre; among its features there is an innovative crossover procedure, which tends to select individuals with average fitness values, rather than those who show best values at each ‘generation’.

A particularly thorough research design has been put on: (1) the observed sample is half-split to obtain a training and a testing set, which are analysed by means of a back propagation neural network; (2) testing is performed to find out how good the parameter estimates are; (3) a 10% sample is randomly extracted from the training set and used as a reduced training set; (4) on this narrow basis, GenD calculates the pseudo-inverse of the estimated parameter matrix; (5) ‘virtual’ data are tested against the testing data set (which has never been used for training).

The algorithm has been proved on a particularly difficult ground, since the data set used as a basis for generating ‘virtual’ cases counts only 44 respondents, randomly sampled from a broader data set taken from the General Social Survey 2002. The major result is that networks trained on the ‘virtual’ resample show a model fit as good as the one of the observed data, though ‘virtual’ and observed data differ on some features. It can be seen that GenD ‘refills’ the joint distribution of the independent variables, conditioned by the dependent one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • P.M. Blau O.D. Duncan (1967) The American Occupational Structure Wiley New York

    Google Scholar 

  • M. Buscema (1995) ArticleTitleSelf-reflexive networks. Theory, topology, applications Quality and Quantity 29 IssueID4 339–403 Occurrence Handle10.1007/BF01106064

    Article  Google Scholar 

  • M. Buscema (1998) ArticleTitleBack propagation neural networks Substance Use and Misuse 33 IssueID2 233–270 Occurrence Handle10.3109/10826089809115863

    Article  Google Scholar 

  • M. Buscema (2004) ArticleTitleGenetic Doping Algorithm (GenD): theory and applications Expert Systems 21 IssueID2 63–79 Occurrence Handle10.1111/j.1468-0394.2004.00264.x

    Article  Google Scholar 

  • Buscema, M., Breda, M. & Terzi, S. (2002a). Sine Net. Technical Paper n.21, Rome, Semeion.

  • Buscema, M., Breda, M. & Terzi, S. (2002b). Complex Recurrent Networks: a new type of Micro Artificial Organism. Technical Paper n.27, Rome, Semeion.

  • R. Eberhart P. Simpson R. Dobbins (1996) Computational Intelligence PC Tools Academic Press London

    Google Scholar 

  • B. Efron (1979) ArticleTitleBootstrap Methods: Another Look at the Jackknife Annals of Statistics 7 1–26

    Google Scholar 

  • Efron, B. (1982). The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia: SIAM, monograph 38.

  • B. Efron G. Gong (1983) ArticleTitleA leisurely look at the Bootstrap, the Jackknife, and the cross-validation The American Statistician 37 IssueID1 36–48 Occurrence Handle10.2307/2685844

    Article  Google Scholar 

  • S. Geisser (1975) ArticleTitleThe predictive sample reuse method with applications Journal of the America Statistical Society 70 320–328

    Google Scholar 

  • Kosko, B. (1992). Neural Networks and Fuzzy Systems. A Dynamical System Approach to Machine Intelligence. Engelwood Cliffs: Prentice-Hall.

  • Mecocci, P., Grossi, E., Buscema, M., Intraligi, M., Savarè, R., Rinaldi, P., Cherubini, A. & Senin, U. (2002). Use of artificial networks in clinical trials: a Pilot study to predict responsiveness to Donepezil in Alzheimer’s disease. Journal of American Geriatric Society 50(11):1857–1860.

    Google Scholar 

  • Meraviglia, C. (2001). Le Reti Neurali nella Ricerca Sociale (Neural Networks for Social Research). Milan: Franco Angeli.

  • Miller, R. G. (1974). The Jackknife – A review. Biometrika 61: 1–15 PDP on autoassociative nets.

  • M.H. Quenouille (1956) ArticleTitleNotes on bias in estimation Biometrika 43 353–360 Occurrence Handle10.2307/2332914

    Article  Google Scholar 

  • Rumelhart, D. E., Durbin, R., Golden, R. & Chauvin, Y. (1995). Backpropagation: The Basic Theory. In: Y. Chauvin & D. E. Rumelhart (eds.), Back Propagation: Theory, Architectures and Applications. Hillsdale (NJ): Erlbaum.

  • Rumelhart, D. E, Hinton, G. E. & Williams R. J. (1986). Learning internal representation by error propagation. In: D. E. Rumelhart & J. L. McClelland (eds.), Parallel Distributed Processing. Explorations in the Microstructure of Cognition, Vol. 1. Cambridge (MA): The MIT Press, pp. 318–362.

  • Rumelhart, D. E. & McClelland, J. L. (eds) (1986). Parallel Distributed Processing. Explorations in the Microstructure of Cognition, Vol. I–II. Cambridge (MA): The MIT Press.

  • M. Stone (1974) ArticleTitleCross-validatory choice and assessment of statistical prediction Journal of the Royal Statistical Society, Series B 36 111–147

    Google Scholar 

  • J.W. Tuckey (1958) ArticleTitleBias and confidence in not-quite large samples Annals of Mathematical Statistics 29 614

    Google Scholar 

  • T.W. Vomweg M. Buscema H.U. Kauczor A. Teifke M. Intraligi S. Terzi C.P. Heussel T. Achenbach M. Thelen (2003) ArticleTitleImproved artificial neural networks in dignity prediction of enhancing lesions in contrast-enhanced MR-mammography Medical Physics 30 2350–2359 Occurrence Handle10.1118/1.1600871

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cinzia Meraviglia.

Additional information

This paper is the result of deep collaboration among all authors. Cinzia Meraviglia wrote § 1, 3, 4, 6, 7 and 8; Giulia Massini wrote §5; Daria Croce performed some elaborations with neural networks and linear regression; Massimo Buscema wrote §2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meraviglia, C., Massini, G., Croce, D. et al. GenD An Evolutionary System for Resampling in Survey Research. Qual Quant 40, 825–859 (2006). https://doi.org/10.1007/s11135-005-3264-x

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11135-005-3264-x

Keywords

Navigation