Skip to main content
Log in

User profiling and satisfaction inference in public information access services

  • Published:
Journal of Intelligent Information Systems Aims and scope Submit manuscript

Abstract

Public information access services are provided by dozens of countries around the world as a means to promote transparency and democracy, and present a number of research opportunities for the development of computational models that help understand both users and their needs. Based on these observations, the present work discusses how the use of Natural Language Processing (NLP) methods may harvest valuable knowledge about citizen-government communication in user profiling and satisfaction inference tasks. More specifically, from a large text dataset of this kind, we build a number of models using a range of supervised machine learning methods - including bidirectional long short-term memory networks (LSTMs), pre-trained context-sensitive embeddings (BERT) and others - and show that these outperform textual and non-textual baseline alternatives alike. This outcome makes a case in favour of NLP methods for these tasks, and paves the way for further applications in the public information access domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Availability of Data and Material

https://doi.org/https://drive.google.com/file/d/12sFdgipuK2d1QyrTlnv5QwFj1Gs5mdnI/view?usp=sharing

Code Availability

https://github.com/arthurmarcal/user-satisfaction-public-services

Notes

  1. https://www.foia.gov/

  2. http://esic.gov.br/

  3. Supported by the Brazilian access information legislation and decree 8777/16.

  4. https://esic.cgu.gov.br/

  5. For a comparison between content-based (e.g., sentiment) and response time features in online chat customer satisfaction, see also (Park et al., 2015).

References

  • Álvarez-Carmona, M., López-Monroy, A., Gómez, M. M., Villaseñor-Pineda, L., & Escalante, H. (2015). INAOE’S participation at PAN’15: Author Profiling task. In CLEF 2015 (p. 9).

  • Ando, A., Masumura, R., Kamiyama, H., Kobashikawa, S., Aono, Y., & Toda, T. (2020). Customer satisfaction estimation in contact center calls based on a hierarchical multi-task model. IEEE/ACM Transactions on Audio Speech, and Language Processing, 28, 715–728. https://doi.org/10.1109/TASLP.2020.2966857

    Article  Google Scholar 

  • Auguste, J., Charlet, D., Damnati, G., Bechet, F., & Favre, B. (2019). Can we predict self-reported customer satisfaction from interactions?. In ICASSP 2019 - 2019 IEEE International conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/ICASSP.2019.8683896 (pp. 7385–7389).

  • Balage Filho, P.P., Aluísio, S.M., & Pardo, T. (2013). An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In 9Th brazilian symposium in information and human language technology - STIL (pp. 215-219). Fortaleza, Brazil.

  • Basile, A., Dwyer, G., Medvedeva, M., Rawee, J., Haagsma, H., & Nissim, M. (2017). N-grAM: New groningen author-profiling model. In Working notes of CLEF 2017 - conference and labs of the evaluation forum, (p. 11). Dublin.

  • Berka, P. (2020). Sentiment analysis using rule-based and case-based reasoning. Journal of Intelligent Information Systems, 55, 51–66. https://doi.org/10.1007/s10844-019-00591-8.

    Article  Google Scholar 

  • Clifton-Sprigg, J., James, J., & Vujic, S. (2020). Freedom of Information (FOI) as a data collection tool for social scientists. PloS one, 15(2), e0228,392. https://doi.org/10.1371/journal.pone.0228392.

    Article  Google Scholar 

  • Custódio, J. E., & Paraboni, I. (2018). EACH-USP Ensemble cross-domain authorship attribution. In Working notes papers of the conference and labs of the evaluation forum (CLEF-2018), (Vol. 2125 p. 7). Avignon, France.

  • de Sousa, R.F., Anchiêta, R.T., & Nunes, M.d.G.V. (2020). A graph-based method for predicting the helpfulness of product opinions. iSys-Brazilian Journal of Information Systems, 13(4), 06–21.

    Article  Google Scholar 

  • Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Vol. 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics.

  • dos Santos, V.G., Paraboni, I., & Silva, B.B.C. (2017). Big five personality recognition from multiple text genres. In Text, speech and dialogue (TSD-2017) lecture notes in artificial intelligence. https://doi.org/10.1007/978-3-319-64206-2_4, (Vol. 10415 pp. 29–37). Czech Republic: Springer.

  • Felix, N., Soares, A., & Castro, P. (2020). Deep learning for named entity recognition in legal domain. Ph.D. thesis, Universidade Federal de Goias. https://doi.org/10.13140/RG.2.2.34738.96961.

  • Flekova, L., Preoţiuc-Pietro, D., & Ungar, L. (2016). Exploring stylistic variation with age and income on Twitter. In Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers). https://doi.org/10.18653/v1/P16-2051 (pp. 313–319). Berlin: Association for Computational Linguistics.

  • Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233.

    Article  Google Scholar 

  • Gallagher, C., Furey, E., & Curran, K. (2019). The application of sentiment analysis and text analytics to customer experience reviews to understand what customers are really saying. International Journal of Data Warehousing and Mining 15(4). https://doi.org/10.4018/IJDWM.2019100102.

  • Goldberg, L.R. (1990). An alternative description of personality: The Big-Five factor structure. Journal of Personality and Social Psychology, 59, 1216–1229.

    Article  Google Scholar 

  • González-Gallardo, C., et al. (2015). Tweets classification using corpus dependent tags, character and POS N-grams. In CLEF 2015 (p. 11).

  • Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., & Aluísio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. In 11Th brazilian symposium in information and human language technology - STIL, (pp. 122–131). Uberlândia, Brazil.

  • Higashinaka, R., Minami, Y., Dohsaka, K., & Meguro, T. (2010). Modeling user satisfaction transitions in dialogues from overall ratings. In Proceedings of the 11th annual meeting of the special interest group on discourse and dialogue, SIGDIAL ’10 (pp. 18–27). USA: Association for Computational Linguistics, Stroudsburg, PA.

  • Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.

    Article  Google Scholar 

  • Hsieh, F.C., Dias, R.F.S., & Paraboni, I. (2018). Author profiling from facebook corpora. In 11Th international conference on language resources and evaluation (LREC-2018) (pp. 2566–2570). ELRA, Miyazaki, Japan.

  • Isbister, T., Kaati, L., & Cohen, K. (2017). Gender classification with data independent features in multiple languages. In European intelligence and security informatics conference (EISIC-2017) (pp. 54–60). Greece: IEEE Computer Society, Athens.

  • Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. In Proceedings of the 15th conference of the european chapter of the association for computational linguistics: Vol. 2, Short Papers (pp. 427–431). Spain: Association for Computational Linguistics, Valencia.

  • Kim, S.M., Xu, Q., Qu, L., Wan, S., & Paris, C. (2017). Demographic inference on Twitter using recursive neural networks. In Proceedings of ACL-2017, (pp. 471–477). Vancouver, Canada.

  • Kumar, S., & Zymbler, M. (2019). A machine learning approach to analyze customer satisfaction from airline tweets. Journal of Big Data 6(62). https://doi.org/10.1186/s40537-019-0224-1.

  • Lennon, C., & Burdick, H. (2014). The lexile framework as an approach for reading measurement and success. Metametrics, durham, north carolina US.

  • Liu, F., Perez, J., & Nowson, S. (2017). A language-independent and compositional model for personality trait recognition from short texts. In Proceedings of EACL-2017 (pp. 754–764). Spain: Association for Computational Linguistics, Valencia.

  • Liu, Y., Bian, J., & Agichtein, E. (2008). Predicting information seeker satisfaction in community question answering. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’08. https://doi.org/10.1145/1390334.1390417 (pp. 483–490). USA: ACM.

  • López-Santillán, R., Montes-Y-Gómez, M., González-Gurrola, L.C., Ramírez-Alonso, G., & Prieto-Ordaz, O. (2020). Richer document embeddings for author profiling tasks based on a heuristic search. Information Processing & Management 57(4). https://doi.org/10.1016/j.ipm.2020.102227.

  • McLean, G., & Osei-Frimpong, K. (2017). Examining satisfaction with the experience during a live chat service encounter-implications for website providers. Computers in Human Behavior, 76, 494–508. https://doi.org/10.1016/j.chb.2017.08.005.

    Article  Google Scholar 

  • McNamara, D.S., Graesser, A.C., McCarthy, P.M., & Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix. New York: Cambridge University Press.

    Book  Google Scholar 

  • McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157. https://doi.org/10.1007/BF02295996.

    Article  Google Scholar 

  • Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K.Q. Weinberger (Eds.) Advances in neural information processing systems 26 (pp. 3111–3119). Curran Associates Inc.

  • Mikolov, T., Wen-tau, S., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proc. of NAACL-HLT-2013 (pp. 746–751). Atlanta: Association for Computational Linguistics.

  • Myers, I.B., & Myers, P. (2010). Gifts differing: Understanding personality type. Hachette: Nicholas Brealey Publishing.

    Google Scholar 

  • Nguyen, D.P., Trieschnigg, R.B., Dogruoz, A.S., Gravel, R., Theune, M., Meder, T., & de Jong, F.M. (2014). Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment. In Proceedings of COLING-2014 (pp. 1950–1961). Association for Computational Linguistics.

  • Pardo, F.M.R., Rosso, P., Potthast, M., & Stein, B. (2017). Overview of the 5th author profiling task at PAN 2017: Gender and language variety identification in Twitter. In Working notes of CLEF 2017 - conference and labs of the evaluation forum, (p. 26). Dublin.

  • Park, K., Kim, J., Park, J., Cha, M., Nam, J., Yoon, S., & Rhim, E. (2015). Mining the minds of customers from online chat logs. In CIKM ’15: Proceedings of the 24th ACM international on conference on information and knowledge management. https://doi.org/10.1145/2806416.2806621 (pp. 1879–1882).

  • Park, Y., & Gates, S.C. (2009). Towards real-time measurement of customer satisfaction using automatically generated call transcripts. In Proceedings of the 18th ACM conference on information and knowledge management, CIKM ’09. https://doi.org/10.1145/1645953.1646128 (pp. 1387–1396). USA: ACM.

  • Pennebaker, J.W., Francis, M.E., & Booth, R.J. (2001). Inquiry and word count: LIWC. Lawrence Erlbaum, Mahwah NJ.

  • Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: Global vectors for word representation. In Proceedings of EMNLP-2014 (pp. 1532–1543).

  • Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, Vol. 1 (Long Papers). https://doi.org/10.18653/v1/N18-1202 (pp. 2227–2237). USA: Association for Computational Linguistics.

  • Pizarro, J. (2019). Using N-grams to detect Bots on Twitter. In L. Cappellato, N. Ferro, D. Losada, & H. Müller (Eds.) CLEF 2019 Labs and Workshops, Notebook Papers. CEUR-WS.org (p. 10).

  • Polignano, M., de Gemmis, M., & Semeraro, G. (2020). Contextualized BERT sentence embeddings for author profiling: The cost of performances. In Computational science and its applications (ICCSA)-2020, LNCS 12252. https://doi.org/10.1007/978-3-030-58811-3_10 (pp. 135–149). Cham: Springer.

  • Preotiuc-Pietro, D., Liu, Y., Hopkins, D., & Ungar, L. (2017). Beyond binary labels: Political ideology prediction of twitter users. In 55th annual meeting of the association for computational linguistics (pp. 729–740). Vancouver: Association for Computational Linguistics.

  • Price, S., & Hodge, A. (2020). Celebrity profiling using twitter follower feeds. In Working notes of CLEF 2020 - conference and labs of the evaluation forum. CLEF and CEUR-WS.org, thessaloniki, greece.

  • Rangel, F., Celli, F., Rosso, P., Potthast, M., Stein, B., & Daelemans, W. (2015). Overview of the 3rd Author Profiling Task at PAN 2015. In CLEF 2015 Evaluation labs and workshop, (p. 8). Toulouse, France. CEUR-WS.org.

  • Rangel, F., & Rosso, P. (2019). Overview of the 7th author profiling task at PAN 2019: Bots and gender profiling. In L. Cappellato, N. Ferro, D. Losada, & H. Müller (Eds.) CLEF 2019 Labs and workshops, notebook papers. CEUR-WS.org (p. 36).

  • Rangel, F., Rosso, P., Montes-y-Gómez, M., Potthast, M., & Stein, B. (2018). Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter. In L. Cappellato, N. Ferro, J.Y. Nie, & L. Soulier (Eds.) Working Notes Papers of the CLEF 2018 Evaluation Labs, CEUR Workshop Proceedings. CLEF and CEUR-WS.org (p. 38).

  • Rangel, F., Rosso, P., Zaghouani, W., & Charfi, A. (2020). Fine-grained analysis of language varieties and demographics. Natural Language Engineering 1–21. https://doi.org/10.1017/S1351324920000108.

  • Scarton, C.E., & Maria Aluísio, S. (2010). Análise da inteligibilidade de textos via ferramentas de processamento de língua natural: adaptando as métricas do coh-metrix para o português. Linguamá,tica, 2(1), 45–61.

    Google Scholar 

  • Silva, B.B.C., & Paraboni, I. (2018). Learning personality traits from Facebook text. IEEE Latin America Transactions, 16(4), 1256–1262. https://doi.org/10.1109/TLA.2018.8362165.

    Article  Google Scholar 

  • Singh, L.G., & Singh, S.R. (2021). Empirical study of sentiment analysis tools and techniques on societal topics. Journal of Intelligent Information Systems, 56, 379–407. https://doi.org/10.1007/s10844-020-00616-7.

    Article  Google Scholar 

  • Song, K., Bing, L., Gao, W., Lin, J., Zhao, L., Wang, J., Sun, C., Liu, X., & Zhang, Q. (2019). Using customer service dialogues for satisfaction analysis with context-assisted multiple instance learning. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). https://doi.org/10.18653/v1/D19-1019 (pp. 198–207). China: Association for Computational Linguistics.

  • Souza, F., Nogueira, R., & Lotufo, R. (2019). Portuguese named entity recognition using bert-crf. arXiv:1909.10649.

  • Takahashi, T., Tahara, T., Nagatani, K., Miura, Y., Taniguchi, T., & Ohkuma, T. (2018). Text and image synergy with feature cross technique for gender identification. In Working notes papers of the conference and labs of the evaluation forum (CLEF-2018). Avignon, France, (Vol. 2125 p. 12).

  • Tang, D., Qin, B., Liu, T., & Yang, Y. (2015). User modeling with neural network for review rating prediction. In Proceedings of the 24th international conference on artificial intelligence, IJCAI’15 (pp. 1340–1346). AAAI Press.

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., ukasz Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.) Advances in neural information processing systems 30 (pp. 5998–6008). Curran Associates Inc.

  • Verhoeven, B., Daelemans, W., & Plank, B. (2016). Twisty: a multilingual Twitter Stylometry corpus for gender and personality profiling. In 10th international conference on language resources and evaluation (LREC-2016) (pp. 1632–1637). Slovenia: ELRA.

  • Wolpert, D.H. (1992). Stacked generalization. Neural Networks, 5 (2), 241–259.

    Article  Google Scholar 

  • Yom-Tov, G.B., Ashtar, S., Altman, D., Natapov, M., Barkay, N., Westphal, M., & Rafaeli, A. (2018). Customer sentiment in web-based service interactions: Automated analyses and new insights. In WWW ’18: companion proceedings of the the web conference 2018 (pp. 1689–1698). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE.

  • Zeng, Z., Luo, C., Shang, L., Li, H., & Sakai, T. (2018). Towards automatic evaluation of customer-helpdesk dialogues. Journal of Information Processing, 26, 768–778. https://doi.org/10.2197/ipsjjip.26.768.

    Article  Google Scholar 

  • Zhang, L., & Wang, V. (2018). a.B.L.: Deep learning for sentiment analysis: A survey. WIREs Data Mining and Knowledge Discovery, 8(4), e1253.

    Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to Dr Sidney Evaldo Leal and Dr Sandra Maria Aluísio (USP) for the Coh-Metrix-Port feature extraction, and to Dr Elias Jacob de Menezes Neto (UFRN) for providing us with an early version of the present corpus. We also thank the anonymous reviewers for the valuable input to improve this article.

Funding

The third author received support from the University of São Paulo PRP grant nr. 668/2018.

Author information

Authors and Affiliations

Authors

Contributions

Arthur Flores: Conceptualisation, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft, reviewing and editing.

Matheus Pavan: User profiling methodology, software, and validation.

Ivandré Paraboni: Conceptualisation, Writing - review and editing, supervision, project, fund acquisition.

Corresponding author

Correspondence to Ivandré Paraboni.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Flores, A.M., Pavan, M.C. & Paraboni, I. User profiling and satisfaction inference in public information access services. J Intell Inf Syst 58, 67–89 (2022). https://doi.org/10.1007/s10844-021-00661-w

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10844-021-00661-w

Keywords

Navigation