Inferring the Socioeconomic Status of Social Media Users Based on Behaviour and Language

  • Vasileios Lampos
  • Nikolaos Aletras
  • Jens K. Geyti
  • Bin Zou
  • Ingemar J. Cox
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9626)

Abstract

This paper presents a method to classify social media users based on their socioeconomic status. Our experiments are conducted on a curated set of Twitter profiles, where each user is represented by the posted text, topics of discussion, interactive behaviour and estimated impact on the microblogging platform. Initially, we formulate a 3-way classification task, where users are classified as having an upper, middle or lower socioeconomic status. A nonlinear, generative learning approach using a composite Gaussian Process kernel provides significantly better classification accuracy (\(75\,\%\)) than a competitive linear alternative. By turning this task into a binary classification – upper vs. medium and lower class – the proposed classifier reaches an accuracy of \(82\,\%\).

Keywords

Social media Twitter User profiling Socioeconomic status Classification Gaussian Process 

Notes

Acknowledgements

This work has been supported by the EPSRC grant EP/K031953/1 (“Early-Warning Sensing Systems for Infectious Diseases”).

References

  1. 1.
    Bollen, J., Mao, H., Zeng, X.: Twitter mood predicts the stock market. J. Comput. Sci. 2(1), 1–8 (2011)CrossRefGoogle Scholar
  2. 2.
    Burger, D.J., Henderson, J., Kim, G., Zarrella, G.: Discriminating gender on twitter. In: EMNLP, pp. 1301–1309 (2011)Google Scholar
  3. 3.
    Cowan, C.D., et al.: Improving the measurement of socioeconomic status for the national assessment of educational progress: a theoretical foundation. Technical report, National Center for Education Statistics (2003)Google Scholar
  4. 4.
    Culotta, A.: Towards detecting influenza epidemics by analyzing Twitter messages. In: SMA, pp. 115–122 (2010)Google Scholar
  5. 5.
    Elias, P., Birch, M.: SOC2010: revision of the standard occupational classification. Econ. Labour Mark. Rev. 4(7), 48–55 (2010)CrossRefGoogle Scholar
  6. 6.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)CrossRefGoogle Scholar
  7. 7.
    Lampos, V., Aletras, N., Preoţiuc-Pietro, D., Cohn, T.: Predicting and characterising user impact on Twitter. In: EACL, pp. 405–413 (2014)Google Scholar
  8. 8.
    Lampos, V., Cristianini, N.: Tracking the flu pandemic by monitoring the social web. In: CIP, pp. 411–416 (2010)Google Scholar
  9. 9.
    Lampos, V., Miller, A.C., Crossan, S., Stefansen, C.: Advances in nowcasting influenza-like illness rates using search query logs. Sci. Rep. 5, 12760 (2015)CrossRefGoogle Scholar
  10. 10.
    Lampos, V., Preoţiuc-Pietro, D., Cohn, T.: A user-centric model of voting intention from social media. In: ACL, pp. 993–1003 (2013)Google Scholar
  11. 11.
    Lampos, V., Yom-Tov, E., Pebody, R., Cox, I.: Assessing the impact of a health intervention via user-generated Internet content. Data Min. Knowl. Disc. 29(5), 1434–1457 (2015)CrossRefGoogle Scholar
  12. 12.
    von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Preoţiuc-Pietro, D., Volkova, S., Lampos, V., Bachrach, Y., Aletras, N.: Studying user income through language, behaviour and affect in social media. PLoS ONE 10(9), e0138717 (2015)CrossRefGoogle Scholar
  14. 14.
    Preoţiuc-Pietro, D., Lampos, V., Aletras, N.: An analysis of the user occupational class through Twitter content. In: ACL, pp. 1754–1764 (2015)Google Scholar
  15. 15.
    Rao, D., Yarowsky, D., Shreevats, A., Gupta, M.: Classifying latent user attributes in Twitter. In: SMUC, pp. 37–44 (2010)Google Scholar
  16. 16.
    Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)MATHGoogle Scholar
  17. 17.
    Rose, D., Pevalin, D.: Re-basing the NS-SEC on SOC2010: a report to ONS. Techincal report, University of Essex (2010)Google Scholar
  18. 18.
    Williams, C.K.I., Barber, D.: Bayesian classification with Gaussian processes. IEEE Trans. Pattern Anal. 20(12), 1342–1351 (1998)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Vasileios Lampos
    • 1
  • Nikolaos Aletras
    • 1
  • Jens K. Geyti
    • 1
  • Bin Zou
    • 1
  • Ingemar J. Cox
    • 1
    • 2
  1. 1.Department of Computer ScienceUniversity College LondonLondonUK
  2. 2.Department of Computer ScienceUniversity of CopenhagenCopenhagenDenmark

Personalised recommendations