GenD An Evolutionary System for Resampling in Survey Research

Meraviglia, Cinzia; Massini, Giulia; Croce, Daria; Buscema, Massimo

doi:10.1007/s11135-005-3264-x

GenD An Evolutionary System for Resampling in Survey Research

Published: October 2006

Volume 40, pages 825–859, (2006)
Cite this article

Quality and Quantity Aims and scope Submit manuscript

Cinzia Meraviglia¹,
Giulia Massini²,
Daria Croce³ &
…
Massimo Buscema²

35 Accesses
4 Citations
Explore all metrics

Abstract

The paper is a preliminary research report and presents a method for generating new records using an evolutionary algorithm (close to but different from a genetic algorithm). This method, called Pseudo-Inverse Function (in short P-I Function), was designed and implemented at Semeion Research Centre (Rome). P-I Function is a method to generate new (virtual) data from a small set of observed data. P-I Function can be of aid when budget constraints limit the number of interviewees, or in case of a population that shows some sociologically interesting trait, but whose small size can seriously affect the reliability of estimates, or in case of secondary analysis on small samples.

The applicative ground is given by research design with one or more dependent and a set of independent variables. The estimation of new cases takes place according to the maximization of a fitness function and outcomes a number as large as needed of ‘virtual’ cases, which reproduce the statistical traits of the original population. The algorithm used by P-I Function is known as Genetic Doping Algorithm (GenD), designed and implemented by Semeion Research Centre; among its features there is an innovative crossover procedure, which tends to select individuals with average fitness values, rather than those who show best values at each ‘generation’.

A particularly thorough research design has been put on: (1) the observed sample is half-split to obtain a training and a testing set, which are analysed by means of a back propagation neural network; (2) testing is performed to find out how good the parameter estimates are; (3) a 10% sample is randomly extracted from the training set and used as a reduced training set; (4) on this narrow basis, GenD calculates the pseudo-inverse of the estimated parameter matrix; (5) ‘virtual’ data are tested against the testing data set (which has never been used for training).

The algorithm has been proved on a particularly difficult ground, since the data set used as a basis for generating ‘virtual’ cases counts only 44 respondents, randomly sampled from a broader data set taken from the General Social Survey 2002. The major result is that networks trained on the ‘virtual’ resample show a model fit as good as the one of the observed data, though ‘virtual’ and observed data differ on some features. It can be seen that GenD ‘refills’ the joint distribution of the independent variables, conditioned by the dependent one.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Data-Driven Approach to Construct Survey-Based Indicators by Means of Evolutionary Algorithms

Article 09 November 2016

Bayesian network structural learning from complex survey data: a resampling based approach

Article 21 January 2022

Preprocessing Tools for Nonlinear Datasets

References

P.M. Blau O.D. Duncan (1967) The American Occupational Structure Wiley New York
Google Scholar
M. Buscema (1995) ArticleTitleSelf-reflexive networks. Theory, topology, applications Quality and Quantity 29 IssueID4 339–403 Occurrence Handle10.1007/BF01106064
Article Google Scholar
M. Buscema (1998) ArticleTitleBack propagation neural networks Substance Use and Misuse 33 IssueID2 233–270 Occurrence Handle10.3109/10826089809115863
Article Google Scholar
M. Buscema (2004) ArticleTitleGenetic Doping Algorithm (GenD): theory and applications Expert Systems 21 IssueID2 63–79 Occurrence Handle10.1111/j.1468-0394.2004.00264.x
Article Google Scholar
Buscema, M., Breda, M. & Terzi, S. (2002a). Sine Net. Technical Paper n.21, Rome, Semeion.
Buscema, M., Breda, M. & Terzi, S. (2002b). Complex Recurrent Networks: a new type of Micro Artificial Organism. Technical Paper n.27, Rome, Semeion.
R. Eberhart P. Simpson R. Dobbins (1996) Computational Intelligence PC Tools Academic Press London
Google Scholar
B. Efron (1979) ArticleTitleBootstrap Methods: Another Look at the Jackknife Annals of Statistics 7 1–26
Google Scholar
Efron, B. (1982). The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia: SIAM, monograph 38.
B. Efron G. Gong (1983) ArticleTitleA leisurely look at the Bootstrap, the Jackknife, and the cross-validation The American Statistician 37 IssueID1 36–48 Occurrence Handle10.2307/2685844
Article Google Scholar
S. Geisser (1975) ArticleTitleThe predictive sample reuse method with applications Journal of the America Statistical Society 70 320–328
Google Scholar
Kosko, B. (1992). Neural Networks and Fuzzy Systems. A Dynamical System Approach to Machine Intelligence. Engelwood Cliffs: Prentice-Hall.
Mecocci, P., Grossi, E., Buscema, M., Intraligi, M., Savarè, R., Rinaldi, P., Cherubini, A. & Senin, U. (2002). Use of artificial networks in clinical trials: a Pilot study to predict responsiveness to Donepezil in Alzheimer’s disease. Journal of American Geriatric Society 50(11):1857–1860.
Google Scholar
Meraviglia, C. (2001). Le Reti Neurali nella Ricerca Sociale (Neural Networks for Social Research). Milan: Franco Angeli.
Miller, R. G. (1974). The Jackknife – A review. Biometrika 61: 1–15 PDP on autoassociative nets.
M.H. Quenouille (1956) ArticleTitleNotes on bias in estimation Biometrika 43 353–360 Occurrence Handle10.2307/2332914
Article Google Scholar
Rumelhart, D. E., Durbin, R., Golden, R. & Chauvin, Y. (1995). Backpropagation: The Basic Theory. In: Y. Chauvin & D. E. Rumelhart (eds.), Back Propagation: Theory, Architectures and Applications. Hillsdale (NJ): Erlbaum.
Rumelhart, D. E, Hinton, G. E. & Williams R. J. (1986). Learning internal representation by error propagation. In: D. E. Rumelhart & J. L. McClelland (eds.), Parallel Distributed Processing. Explorations in the Microstructure of Cognition, Vol. 1. Cambridge (MA): The MIT Press, pp. 318–362.
Rumelhart, D. E. & McClelland, J. L. (eds) (1986). Parallel Distributed Processing. Explorations in the Microstructure of Cognition, Vol. I–II. Cambridge (MA): The MIT Press.
M. Stone (1974) ArticleTitleCross-validatory choice and assessment of statistical prediction Journal of the Royal Statistical Society, Series B 36 111–147
Google Scholar
J.W. Tuckey (1958) ArticleTitleBias and confidence in not-quite large samples Annals of Mathematical Statistics 29 614
Google Scholar
T.W. Vomweg M. Buscema H.U. Kauczor A. Teifke M. Intraligi S. Terzi C.P. Heussel T. Achenbach M. Thelen (2003) ArticleTitleImproved artificial neural networks in dignity prediction of enhancing lesions in contrast-enhanced MR-mammography Medical Physics 30 2350–2359 Occurrence Handle10.1118/1.1600871
Article Google Scholar

Download references

Author information

Authors and Affiliations

University of Eastern Piedmont, Alessandria, Italy
Cinzia Meraviglia
Semeion Research Centre of Sciences of Communication, Rome, Italy
Giulia Massini & Massimo Buscema
University of Milan-Bicocca, Milan, Italy
Daria Croce

Authors

Cinzia Meraviglia
View author publications
You can also search for this author in PubMed Google Scholar
Giulia Massini
View author publications
You can also search for this author in PubMed Google Scholar
Daria Croce
View author publications
You can also search for this author in PubMed Google Scholar
Massimo Buscema
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Cinzia Meraviglia.

Additional information

This paper is the result of deep collaboration among all authors. Cinzia Meraviglia wrote § 1, 3, 4, 6, 7 and 8; Giulia Massini wrote §5; Daria Croce performed some elaborations with neural networks and linear regression; Massimo Buscema wrote §2.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Meraviglia, C., Massini, G., Croce, D. et al. GenD An Evolutionary System for Resampling in Survey Research. Qual Quant 40, 825–859 (2006). https://doi.org/10.1007/s11135-005-3264-x

Download citation

Issue Date: October 2006
DOI: https://doi.org/10.1007/s11135-005-3264-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GenD An Evolutionary System for Resampling in Survey Research

Abstract

Access this article

Similar content being viewed by others

A Data-Driven Approach to Construct Survey-Based Indicators by Means of Evolutionary Algorithms

Bayesian network structural learning from complex survey data: a resampling based approach

Preprocessing Tools for Nonlinear Datasets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

GenD An Evolutionary System for Resampling in Survey Research

Abstract

Access this article

Similar content being viewed by others

A Data-Driven Approach to Construct Survey-Based Indicators by Means of Evolutionary Algorithms

Bayesian network structural learning from complex survey data: a resampling based approach

Preprocessing Tools for Nonlinear Datasets

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation