Skip to main content

Personality Profiling from Text: Introducing Part-of-Speech N-Grams

  • Conference paper
User Modeling, Adaptation, and Personalization (UMAP 2014)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8538))

Abstract

A support vector machine is trained to classify the Five Factor personality of writers of free text. Writers are classified for each of the five personality dimensions as high/low with the mean personality score for each dimension used for the dividing point. Writers are also separately classified as high/medium/low with division points at one standard deviation above and below mean. The two-class average accuracy using 5-fold cross validation of 80.6% is much better than the baseline (pick most likely class) accuracy of 50%, but the 3-class accuracy is only slightly better (7.4%) than baseline because most writers fall into the medium class due to the normal distribution of personality values. Features include bag of words, essay length, word sentiment, negation count and part-of-speech n-grams. The consistently positive contribution of POS n-grams (averaging 4.8% and 5.8% for the 2/3 class cases) is analyzed in detail. The information gain for the most predictive features for each of the five personality dimensions are presented and discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Argamon, S., et al.: Stylistic text classification using functional lexical features. Journal of the American Society for Information Science and Technology 58(6), 802–822 (2007)

    Article  Google Scholar 

  2. Block, J.: The five-factor framing of personality and beyond: Some ruminations. Psychological Inquiry 21(1), 2–25 (2010)

    Article  MathSciNet  Google Scholar 

  3. Bradford, J.P., Brodley, C.E.: The effect of instance-space partition on significance. Machine Learning 42(3), 269–286 (2001)

    Article  MATH  Google Scholar 

  4. Costa, P.T., McCrae, R.R.: Neo PI-R professional manual. Psychological Assessment Resources 396, 653–665 (1992)

    Google Scholar 

  5. Golbeck, J., et al.: Predicting personality from twitter. In: 3rd International Conference on Social Computing, pp. 149–156. IEEE (2011)

    Google Scholar 

  6. Goldberg, L.R.: An alternative description of personality: the big-five factor structure. Journal of Personality and Social Psychology 59(6), 1216 (1990)

    Article  Google Scholar 

  7. John, O.P., et al.: Handbook of personality: theory and research. The Guilford Press (2008)

    Google Scholar 

  8. Luyckx, K., Daelemans, W.: Using syntactic features to predict author personality from text. In: Proceedings of Digital Humanities 2008 (DH 2008), pp. 146–149 (2008)

    Google Scholar 

  9. McCrae, R.R., et al.: The NEO–PI–3: A more readable revised NEO personality inventory. Journal of Personality Assessment 84(3), 261–270 (2005)

    Article  Google Scholar 

  10. http://mpqa.cs.pitt.edu/ (retrieved October 2013)

  11. Pennebaker, J.W., King, L.A.: Linguistic styles: language use as an individual difference. Journal of Personality and Social Psychology 77(6), 1296 (1999)

    Article  Google Scholar 

  12. Porter, M.: The English (Porter2) stemming algorithm (2006), http://snowball.tartarus.org/algorithms/english/stemmer.html (Online; accessed March 2, 2013)

  13. Roshchina, A., et al.: User Profile Construction in the TWIN Personalitybased Recommender System. In: Sentiment Analysis where AI meets Psychology (SAAIP), p. 73 (2011)

    Google Scholar 

  14. Salzberg, S.L., Fayyad, U.: On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach. Data Mining and Knowledge Discovery, 317–328 (1997)

    Google Scholar 

  15. Shen, J., Brdiczka, O., Liu, J.: Understanding Email Writers: Personality Prediction from Email Messages. In: Carberry, S., Weibelzahl, S., Micarelli, A., Semeraro, G. (eds.) UMAP 2013. LNCS, vol. 7899, pp. 318–330. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  16. Toutanova, K., et al.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings, Conference on Human Language Technology, pp. 173–180. Association for Computational Linguistics (2003)

    Google Scholar 

  17. Wilson, T., et al.: Recognizing contextual polarity in phrase-level sentiment analysis. In: Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347–354. Association for Computational Linguistics (2005)

    Google Scholar 

  18. Wright, W.: Literature Review, http://www2.hawaii.edu/~wrightwr/WilliamWright/_literature/_review.pdf (Online; accessed March 2, 2013)

  19. http://www.cs.waikato.ac.nz/ml/weka/ (retrieved October 2013)

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Wright, W.R., Chin, D.N. (2014). Personality Profiling from Text: Introducing Part-of-Speech N-Grams. In: Dimitrova, V., Kuflik, T., Chin, D., Ricci, F., Dolog, P., Houben, GJ. (eds) User Modeling, Adaptation, and Personalization. UMAP 2014. Lecture Notes in Computer Science, vol 8538. Springer, Cham. https://doi.org/10.1007/978-3-319-08786-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08786-3_21

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08785-6

  • Online ISBN: 978-3-319-08786-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics