Skip to main content
Log in

The effect of human thought on data: an analysis of self-reported data in supervised learning and neural networks

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

While scientific applications can gather consistent data from the natural world, psychological, sociological, and even economic applications rely on data provided by people. Since the majority of machine learning is aimed at improving the lives of people, human input is essential for useful results. In this paper, we explore datasets where input and target attributes are provided by people taking surveys. Every survey dataset, generated from human input, is reliable and self-consistent according to Cronbach’s alpha. One expects a reliable questionnaire to provide effective data for learning. It is this expectation that our analysis finds false, when applied to supervised learning. Both statistical analysis and application of several supervised learning architectures, with a focus on neural networks, are utilized to provide insight into data gathered through human input.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Lovinger, J.: Clever surveys. https://www.cleversurveys.com/. Accessed 30 Dec 2016

  2. Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)

    Article  Google Scholar 

  3. Basu, M., Ho, T.K.: Data Complexity in Pattern Recognition. Springer Science & Business Media, Berlin (2006)

    Book  MATH  Google Scholar 

  4. Sáez, J.A., Krawczyk, B., Woźniak, M.: Analyzing the oversampling of different classes and types of examples in multi-class imbalanced datasets. Pattern Recogn. 57, 164–178 (2016)

    Article  Google Scholar 

  5. Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)

    MATH  Google Scholar 

  6. Smith, M.R., Martinez, T.: Improving classification accuracy by identifying and removing instances that should be misclassified. In: The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 2690–2697. IEEE (2011)

  7. Sánchez, J.S., Barandela, R., Marqués, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recogn. Lett. 24(7), 1015–1022 (2003)

    Article  Google Scholar 

  8. Barandela, R., Gasca, E.: Decontamination of training samples for supervised pattern recognition methods. In: Ferri, F.J., Inesta, J.M., Amin, A., Pudil, P. (eds.) Advances in Pattern Recognition, pp. 621–630. Springer, Berlin (2000)

  9. Jiang, Y., Zhou, Z.-H.: Editing training data for knn classifiers with neural network ensemble. In: Advances in Neural Networks–ISNN 2004, pp. 356–361. Springer, Berlin (2004)

  10. Bootkrajang, J., Kabán, A.: Multi-class classification in the presence of labelling errors. In: ESANN, Citeseer (2011)

  11. Harhoff, D., Körting, T.: Lending relationships in Germany–empirical evidence from survey data. J. Bank. Finance 22(10), 1317–1353 (1998)

    Article  Google Scholar 

  12. De Vaus, D.: Surveys in Social Research. Routledge, London (2013)

    Google Scholar 

  13. Thompson, D.F.: Deliberative democratic theory and empirical political science. Annu. Rev. Polit. Sci. 11, 497–520 (2008)

    Article  Google Scholar 

  14. van Kampen, D.: The 5-dimensional personality test (5dpt): relationships with two lexically based instruments and the validation of the absorption scale. J. Personal. Assess. 94(1), 92–101 (2012)

    Article  Google Scholar 

  15. Burisch, M.: Approaches to personality inventory construction: a comparison of merits. Am. Psychol. 39(3), 214 (1984)

    Article  Google Scholar 

  16. Reyes-Ortiz, J.-L., Anguita, D., Ghio, A., Parra, X.: Human activity recognition using smartphones data set. UCI Machine Learning Repository (2013)

  17. Aha, D.W.: Heart disease data set. UCI Machine Learning Repository (1988)

  18. Gonyea, R.M.: Self-reported data in institutional research: review and recommendations. New Dir. Inst. Res. 127, 73 (2005)

    Google Scholar 

  19. Harrison, L.D.: The validity of self-reported data on drug use. J. Drug Issues 25(1), 91–111 (1995)

    Article  Google Scholar 

  20. van Poppel, M.N.M., de Vet, H.C.W., Koes, B.W., Smid, T., Bouter, L.M.: Measuring sick leave: a comparison of self-reported data on sick leave and data from company records. Occup. Med. 52(8), 485–490 (2002)

    Article  Google Scholar 

  21. Wang, S.: Classification with incomplete survey data: a hopfield neural network approach. Comput. Oper. Res. 32(10), 2583–2594 (2005)

    Article  MATH  Google Scholar 

  22. Lu, C., Li, X.-W., Pan, H.-B.: Application of extension neural network for classification with incomplete survey data. In: First International Conference on Innovative Computing, Information and Control, 2006. ICICIC’06, vol. 3, pp. 190–193. IEEE (2006)

  23. Tagliaferri, R., Longo, G., Milano, L., Acernese, F., Barone, F., Ciaramella, A., De Rosa, R., Donalek, C., Eleuteri, A., Raiconi, G., et al.: Neural networks in astronomy. Neural Netw. 16(3), 297–319 (2003)

    Article  Google Scholar 

  24. Hagan, M.T., Demuth, H.B., Beale, M.H., et al.: Neural Network Design. Pws Pub, Boston (1996)

    Google Scholar 

  25. Gatys, L.A., Ecker, A.S., Bethge, M.: A neural algorithm of artistic style. arXiv preprint arXiv:1508.06576 (2015)

  26. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)

  27. Tóth, L.: Phone recognition with deep sparse rectifier neural networks. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6985–6989. IEEE (2013)

  28. Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol. 30, p. 1 (2013)

  29. Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, pp. 177–186. Springer, Berlin (2010)

  30. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  31. Lowe, D., Broomhead, D.: Multivariable functional interpolation and adaptive networks. Complex Syst. 2, 321–355 (1988)

    MathSciNet  MATH  Google Scholar 

  32. Broomhead, D.S., Lowe, D.: Radial basis functions, multi-variable functional interpolation and adaptive networks. Technical report, DTIC Document (1988)

  33. Tan, Y., Wang, J., Zurada, J.M.: Nonlinear blind source separation using a radial basis function network. IEEE Trans. Neural Netw. 12(1), 124–134 (2001)

    Article  Google Scholar 

  34. Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43(1), 59–69 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  35. Kohonen, T.: The self-organizing map. Proc. IEEE 78(9), 1464–1480 (1990)

    Article  Google Scholar 

  36. Kalteh, A.M., Hjorth, P., Berndtsson, R.: Review of the self-organizing map (SOM) approach in water resources: analysis, modelling and application. Environ. Model. Softw. 23(7), 835–845 (2008)

    Article  Google Scholar 

  37. Mao, K.Z., Tan, K.-C.: Probabilistic neural-network structure determination for pattern classification. IEEE Trans. Neural Netw. 11(4), 1009–1016 (2000)

    Article  Google Scholar 

  38. Gao, M., Tian, J.: Web classification mining based on radial basic probabilistic neural network. In: 2009 First International Workshop on Database Technology and Applications, pp. 586–589. IEEE (2009)

  39. Specht, D.F.: Probabilistic neural networks. Neural Netw. 3(1), 109–118 (1990)

    Article  Google Scholar 

  40. Ho, T.K.: Random decision forests. In: Proceedings of the Third International Conference on Document Analysis and Recognition, 1995, vol. 1, pp. 278–282. IEEE (1995)

  41. Díaz-Uriarte, R., De Andres, S.A.: Gene selection and classification of microarray data using random forest. BMC Bioinf. 7(1), 1 (2006)

    Article  Google Scholar 

  42. Rodriguez-Galiano, V.F., Ghimire, B., Rogan, J., Chica-Olmo, M., Rigol-Sanchez, J.P.: An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 67, 93–104 (2012)

    Article  Google Scholar 

  43. Utgoff, P.E.: Incremental induction of decision trees. Mach. Learn. 4(2), 161–186 (1989)

  44. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing (1994)

  45. Pradhan, B.: A comparative study on the predictive ability of the decision tree, support vector machine and neuro-fuzzy models in landslide susceptibility mapping using gis. Comput. Geosci. 51, 350–365 (2013)

    Article  Google Scholar 

  46. Milborrow, S.: Titanic decision tree. https://en.wikipedia.org/wiki/Decision_tree_learning#/media/File:CART_tree_titanic_survivors.png

  47. Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. Rev. 5(1), 3–55 (2001)

    Article  MathSciNet  Google Scholar 

  48. Quinlan, R.J.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  49. scikit-learn: http://scikit-learn.org

  50. TheStoat: Linux distributions. https://www.cleversurveys.com/surveys/survey-5763859801440256_5629499534213120/responses/-linux-distributions. Accessed 30 Dec 2016

  51. Lewis, R., Goldberg, L.R.: The structure of phenotypic personality traits. Am. Psychol. 48(1), 26 (1993)

    Article  Google Scholar 

  52. Costa, P.T., McCrae, R.R.: The revised neo personality inventory (neo-pi-r). SAGE Handb. Personal. Theory Assess. 2, 179–198 (2008)

    Google Scholar 

  53. Turiano, N.A., Mroczek, D.K., Moynihan, J., Chapman, B.P.: Big 5 personality traits and interleukin-6: evidence for healthy neuroticism in a us population sample. Brain Behav. Immun. 28, 83–89 (2013)

    Article  Google Scholar 

  54. PaintingInAir: What pet should i get? https://www.cleversurveys.com/surveys/survey-5709198289534976_5629499534213120/responses/-what-pet-should-i-get. Accessed 30 Dec 2016

  55. AvinaDiviri: Alcoholic drink predictor. https://www.cleversurveys.com/surveys/survey-6024271184789504_5668600916475904/responses/-alcoholic-drink-predictor. Accessed 30 Dec 2016

  56. Marshall, M.: UCI machine learning repository (1988)

  57. Mangasarian, O.L., Setiono, R., Wolberg, W.H.: Pattern recognition via linear programming: theory and application to medical diagnosis (1990)

  58. Tavakol, M., Dennick, R.: Making sense of Cronbach’s alpha. Int. J. Med. Educ. 2, 53 (2011)

    Article  Google Scholar 

  59. Bland, J.M., Altman, D.G.: Statistics notes: Cronbach’s alpha. Bmj 314(7080), 572 (1997)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Iren Valova.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lovinger, J., Valova, I. The effect of human thought on data: an analysis of self-reported data in supervised learning and neural networks. Prog Artif Intell 6, 221–234 (2017). https://doi.org/10.1007/s13748-017-0118-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-017-0118-4

Keywords

Navigation