The effect of human thought on data: an analysis of self-reported data in supervised learning and neural networks

Lovinger, Justin; Valova, Iren

doi:10.1007/s13748-017-0118-4

The effect of human thought on data: an analysis of self-reported data in supervised learning and neural networks

Regular Paper
Published: 16 February 2017

Volume 6, pages 221–234, (2017)
Cite this article

Progress in Artificial Intelligence Aims and scope Submit manuscript

Justin Lovinger¹ &
Iren Valova¹

224 Accesses
Explore all metrics

Abstract

While scientific applications can gather consistent data from the natural world, psychological, sociological, and even economic applications rely on data provided by people. Since the majority of machine learning is aimed at improving the lives of people, human input is essential for useful results. In this paper, we explore datasets where input and target attributes are provided by people taking surveys. Every survey dataset, generated from human input, is reliable and self-consistent according to Cronbach’s alpha. One expects a reliable questionnaire to provide effective data for learning. It is this expectation that our analysis finds false, when applied to supervised learning. Both statistical analysis and application of several supervised learning architectures, with a focus on neural networks, are utilized to provide insight into data gathered through human input.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey Research

Survey Research in HCI

References

Lovinger, J.: Clever surveys. https://www.cleversurveys.com/. Accessed 30 Dec 2016
Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)
Article Google Scholar
Basu, M., Ho, T.K.: Data Complexity in Pattern Recognition. Springer Science & Business Media, Berlin (2006)
Book MATH Google Scholar
Sáez, J.A., Krawczyk, B., Woźniak, M.: Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recogn. 57, 164–178 (2016)
Article Google Scholar
Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)
MATH Google Scholar
Smith, M.R., Martinez, T.: Improving classification accuracy by identifying and removing instances that should be misclassified. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2690–2697. IEEE (2011)
Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recogn. Lett. 24(7), 1015–1022 (2003)
Article Google Scholar
Barandela, R., Gasca, E.: Decontamination of training samples for supervised pattern recognition methods. In: Ferri, F.J., Inesta, J.M., Amin, A., Pudil, P. (eds.) Advances in Pattern Recognition, pp. 621–630. Springer, Berlin (2000)
Jiang, Y., Zhou, Z.-H.: Editing training data for knn classifiers with neural network ensemble. In: Advances in Neural Networks–ISNN 2004, pp. 356–361. Springer, Berlin (2004)
Bootkrajang, J., Kabán, A.: Multi-class classification in the presence of labelling errors. In: ESANN, Citeseer (2011)
Harhoff, D., Körting, T.: Lending relationships in Germany–empirical evidence from survey data. J. Bank. Finance 22(10), 1317–1353 (1998)
Article Google Scholar
De Vaus, D.: Surveys in Social Research. Routledge, London (2013)
Google Scholar
Thompson, D.F.: Deliberative democratic theory and empirical political science. Annu. Rev. Polit. Sci. 11, 497–520 (2008)
Article Google Scholar
van Kampen, D.: The 5-dimensional personality test (5dpt): relationships with two lexically based instruments and the validation of the absorption scale. J. Personal. Assess. 94(1), 92–101 (2012)
Article Google Scholar
Burisch, M.: Approaches to personality inventory construction: a comparison of merits. Am. Psychol. 39(3), 214 (1984)
Article Google Scholar
Reyes-Ortiz, J.-L., Anguita, D., Ghio, A., Parra, X.: Human activity recognition using smartphones data set. UCI Machine Learning Repository (2013)
Aha, D.W.: Heart disease data set. UCI Machine Learning Repository (1988)
Gonyea, R.M.: Self-reported data in institutional research: review and recommendations. New Dir. Inst. Res. 127, 73 (2005)
Google Scholar
Harrison, L.D.: The validity of self-reported data on drug use. J. Drug Issues 25(1), 91–111 (1995)
Article Google Scholar
van Poppel, M.N.M., de Vet, H.C.W., Koes, B.W., Smid, T., Bouter, L.M.: Measuring sick leave: a comparison of self-reported data on sick leave and data from company records. Occup. Med. 52(8), 485–490 (2002)
Article Google Scholar
Wang, S.: Classification with incomplete survey data: a hopfield neural network approach. Comput. Oper. Res. 32(10), 2583–2594 (2005)
Article MATH Google Scholar
Lu, C., Li, X.-W., Pan, H.-B.: Application of extension neural network for classification with incomplete survey data. In: First International Conference on Innovative Computing, Information and Control, 2006. ICICIC’06, vol. 3, pp. 190–193. IEEE (2006)
Tagliaferri, R., Longo, G., Milano, L., Acernese, F., Barone, F., Ciaramella, A., De Rosa, R., Donalek, C., Eleuteri, A., Raiconi, G., et al.: Neural networks in astronomy. Neural Netw. 16(3), 297–319 (2003)
Article Google Scholar
Hagan, M.T., Demuth, H.B., Beale, M.H., et al.: Neural Network Design. Pws Pub, Boston (1996)
Google Scholar
Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
Tóth, L.: Phone recognition with deep sparse rectifier neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6985–6989. IEEE (2013)
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, p. 1 (2013)
Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer, Berlin (2010)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Lowe, D., Broomhead, D.: Multivariable functional interpolation and adaptive networks. Complex Syst. 2, 321–355 (1988)
MathSciNet MATH Google Scholar
Broomhead, D.S., Lowe, D.: Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report, DTIC Document (1988)
Tan, Y., Wang, J., Zurada, J.M.: Nonlinear blind source separation using a radial basis function network. IEEE Trans. Neural Netw. 12(1), 124–134 (2001)
Article Google Scholar
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982)
Article MathSciNet MATH Google Scholar
Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)
Article Google Scholar
Kalteh, A.M., Hjorth, P., Berndtsson, R.: Review of the self-organizing map (SOM) approach in water resources: analysis, modelling and application. Environ. Model. Softw. 23(7), 835–845 (2008)
Article Google Scholar
Mao, K.Z., Tan, K.-C.: Probabilistic neural-network structure determination for pattern classification. IEEE Trans. Neural Netw. 11(4), 1009–1016 (2000)
Article Google Scholar
Gao, M., Tian, J.: Web classification mining based on radial basic probabilistic neural network. In: 2009 First International Workshop on Database Technology and Applications, pp. 586–589. IEEE (2009)
Specht, D.F.: Probabilistic neural networks. Neural Netw. 3(1), 109–118 (1990)
Article Google Scholar
Ho, T.K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, 1995, vol. 1, pp. 278–282. IEEE (1995)
Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinf. 7(1), 1 (2006)
Article Google Scholar
Rodriguez-Galiano, V.F., Ghimire, B., Rogan, J., Chica-Olmo, M., Rigol-Sanchez, J.P.: An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 67, 93–104 (2012)
Article Google Scholar
Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4(2), 161–186 (1989)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing (1994)
Pradhan, B.: A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using gis. Comput. Geosci. 51, 350–365 (2013)
Article Google Scholar
Milborrow, S.: Titanic decision tree. https://en.wikipedia.org/wiki/Decision_tree_learning#/media/File:CART_tree_titanic_survivors.png
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)
Article MathSciNet Google Scholar
Quinlan, R.J.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
scikit-learn: http://scikit-learn.org
TheStoat: Linux distributions. https://www.cleversurveys.com/surveys/survey-5763859801440256_5629499534213120/responses/-linux-distributions. Accessed 30 Dec 2016
Lewis, R., Goldberg, L.R.: The structure of phenotypic personality traits. Am. Psychol. 48(1), 26 (1993)
Article Google Scholar
Costa, P.T., McCrae, R.R.: The revised neo personality inventory (neo-pi-r). SAGE Handb. Personal. Theory Assess. 2, 179–198 (2008)
Google Scholar
Turiano, N.A., Mroczek, D.K., Moynihan, J., Chapman, B.P.: Big 5 personality traits and interleukin-6: evidence for healthy neuroticism in a us population sample. Brain Behav. Immun. 28, 83–89 (2013)
Article Google Scholar
PaintingInAir: What pet should i get? https://www.cleversurveys.com/surveys/survey-5709198289534976_5629499534213120/responses/-what-pet-should-i-get. Accessed 30 Dec 2016
AvinaDiviri: Alcoholic drink predictor. https://www.cleversurveys.com/surveys/survey-6024271184789504_5668600916475904/responses/-alcoholic-drink-predictor. Accessed 30 Dec 2016
Marshall, M.: UCI machine learning repository (1988)
Mangasarian, O.L., Setiono, R., Wolberg, W.H.: Pattern recognition via linear programming: theory and application to medical diagnosis (1990)
Tavakol, M., Dennick, R.: Making sense of Cronbach’s alpha. Int. J. Med. Educ. 2, 53 (2011)
Article Google Scholar
Bland, J.M., Altman, D.G.: Statistics notes: Cronbach’s alpha. Bmj 314(7080), 572 (1997)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Computer and Information Science Department, University of Mass Dartmouth, Dartmouth, MA, USA
Justin Lovinger & Iren Valova

Authors

Justin Lovinger
View author publications
You can also search for this author in PubMed Google Scholar
Iren Valova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Iren Valova.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lovinger, J., Valova, I. The effect of human thought on data: an analysis of self-reported data in supervised learning and neural networks. Prog Artif Intell 6, 221–234 (2017). https://doi.org/10.1007/s13748-017-0118-4

Download citation

Received: 05 September 2016
Accepted: 02 February 2017
Published: 16 February 2017
Issue Date: September 2017
DOI: https://doi.org/10.1007/s13748-017-0118-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The effect of human thought on data: an analysis of self-reported data in supervised learning and neural networks

Abstract

Access this article

Similar content being viewed by others

Survey Research

Survey Research

Survey Research in HCI

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The effect of human thought on data: an analysis of self-reported data in supervised learning and neural networks

Abstract

Access this article

Similar content being viewed by others

Survey Research

Survey Research

Survey Research in HCI

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation