User profiling and satisfaction inference in public information access services

Flores, Arthur Marçal; Pavan, Matheus Camasmie; Paraboni, Ivandré

doi:10.1007/s10844-021-00661-w

User profiling and satisfaction inference in public information access services

Published: 04 August 2021

Volume 58, pages 67–89, (2022)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

Arthur Marçal Flores¹,
Matheus Camasmie Pavan¹ &
Ivandré Paraboni ORCID: orcid.org/0000-0002-7270-1477¹

601 Accesses
5 Citations
Explore all metrics

Abstract

Public information access services are provided by dozens of countries around the world as a means to promote transparency and democracy, and present a number of research opportunities for the development of computational models that help understand both users and their needs. Based on these observations, the present work discusses how the use of Natural Language Processing (NLP) methods may harvest valuable knowledge about citizen-government communication in user profiling and satisfaction inference tasks. More specifically, from a large text dataset of this kind, we build a number of models using a range of supervised machine learning methods - including bidirectional long short-term memory networks (LSTMs), pre-trained context-sensitive embeddings (BERT) and others - and show that these outperform textual and non-textual baseline alternatives alike. This outcome makes a case in favour of NLP methods for these tasks, and paves the way for further applications in the public information access domain.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on sentiment analysis methods, applications, and challenges

Article 07 February 2022

Advances in Social Media Research: Past, Present and Future

Article Open access 06 November 2017

Sentiment Analysis in the Age of Generative AI

Article Open access 05 March 2024

Availability of Data and Material

https://doi.org/https://drive.google.com/file/d/12sFdgipuK2d1QyrTlnv5QwFj1Gs5mdnI/view?usp=sharing

Code Availability

https://github.com/arthurmarcal/user-satisfaction-public-services

Notes

https://www.foia.gov/
http://esic.gov.br/
Supported by the Brazilian access information legislation and decree 8777/16.
https://esic.cgu.gov.br/
For a comparison between content-based (e.g., sentiment) and response time features in online chat customer satisfaction, see also (Park et al., 2015).

References

Álvarez-Carmona, M., López-Monroy, A., Gómez, M. M., Villaseñor-Pineda, L., & Escalante, H. (2015). INAOE’S participation at PAN’15: Author Profiling task. In CLEF 2015 (p. 9).
Ando, A., Masumura, R., Kamiyama, H., Kobashikawa, S., Aono, Y., & Toda, T. (2020). Customer satisfaction estimation in contact center calls based on a hierarchical multi-task model. IEEE/ACM Transactions on Audio Speech, and Language Processing, 28, 715–728. https://doi.org/10.1109/TASLP.2020.2966857
Article Google Scholar
Auguste, J., Charlet, D., Damnati, G., Bechet, F., & Favre, B. (2019). Can we predict self-reported customer satisfaction from interactions?. In ICASSP 2019 - 2019 IEEE International conference on acoustics, speech and signal processing (ICASSP). https://doi.org/10.1109/ICASSP.2019.8683896 (pp. 7385–7389).
Balage Filho, P.P., Aluísio, S.M., & Pardo, T. (2013). An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis. In 9Th brazilian symposium in information and human language technology - STIL (pp. 215-219). Fortaleza, Brazil.
Basile, A., Dwyer, G., Medvedeva, M., Rawee, J., Haagsma, H., & Nissim, M. (2017). N-grAM: New groningen author-profiling model. In Working notes of CLEF 2017 - conference and labs of the evaluation forum, (p. 11). Dublin.
Berka, P. (2020). Sentiment analysis using rule-based and case-based reasoning. Journal of Intelligent Information Systems, 55, 51–66. https://doi.org/10.1007/s10844-019-00591-8.
Article Google Scholar
Clifton-Sprigg, J., James, J., & Vujic, S. (2020). Freedom of Information (FOI) as a data collection tool for social scientists. PloS one, 15(2), e0228,392. https://doi.org/10.1371/journal.pone.0228392.
Article Google Scholar
Custódio, J. E., & Paraboni, I. (2018). EACH-USP Ensemble cross-domain authorship attribution. In Working notes papers of the conference and labs of the evaluation forum (CLEF-2018), (Vol. 2125 p. 7). Avignon, France.
de Sousa, R.F., Anchiêta, R.T., & Nunes, M.d.G.V. (2020). A graph-based method for predicting the helpfulness of product opinions. iSys-Brazilian Journal of Information Systems, 13(4), 06–21.
Article Google Scholar
Devlin, J., Chang, M., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. In J. Burstein, C. Doran, & T. Solorio (Eds.) Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, June 2-7, 2019, Vol. 1 (Long and Short Papers) (pp. 4171–4186). Association for Computational Linguistics.
dos Santos, V.G., Paraboni, I., & Silva, B.B.C. (2017). Big five personality recognition from multiple text genres. In Text, speech and dialogue (TSD-2017) lecture notes in artificial intelligence. https://doi.org/10.1007/978-3-319-64206-2_4, (Vol. 10415 pp. 29–37). Czech Republic: Springer.
Felix, N., Soares, A., & Castro, P. (2020). Deep learning for named entity recognition in legal domain. Ph.D. thesis, Universidade Federal de Goias. https://doi.org/10.13140/RG.2.2.34738.96961.
Flekova, L., Preoţiuc-Pietro, D., & Ungar, L. (2016). Exploring stylistic variation with age and income on Twitter. In Proceedings of the 54th annual meeting of the association for computational linguistics (Volume 2: Short Papers). https://doi.org/10.18653/v1/P16-2051 (pp. 313–319). Berlin: Association for Computational Linguistics.
Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233.
Article Google Scholar
Gallagher, C., Furey, E., & Curran, K. (2019). The application of sentiment analysis and text analytics to customer experience reviews to understand what customers are really saying. International Journal of Data Warehousing and Mining 15(4). https://doi.org/10.4018/IJDWM.2019100102.
Goldberg, L.R. (1990). An alternative description of personality: The Big-Five factor structure. Journal of Personality and Social Psychology, 59, 1216–1229.
Article Google Scholar
González-Gallardo, C., et al. (2015). Tweets classification using corpus dependent tags, character and POS N-grams. In CLEF 2015 (p. 11).
Hartmann, N., Fonseca, E., Shulby, C., Treviso, M., Rodrigues, J., & Aluísio, S. (2017). Portuguese word embeddings: Evaluating on word analogies and natural language tasks. In 11Th brazilian symposium in information and human language technology - STIL, (pp. 122–131). Uberlândia, Brazil.
Higashinaka, R., Minami, Y., Dohsaka, K., & Meguro, T. (2010). Modeling user satisfaction transitions in dialogues from overall ratings. In Proceedings of the 11th annual meeting of the special interest group on discourse and dialogue, SIGDIAL ’10 (pp. 18–27). USA: Association for Computational Linguistics, Stroudsburg, PA.
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
Article Google Scholar
Hsieh, F.C., Dias, R.F.S., & Paraboni, I. (2018). Author profiling from facebook corpora. In 11Th international conference on language resources and evaluation (LREC-2018) (pp. 2566–2570). ELRA, Miyazaki, Japan.
Isbister, T., Kaati, L., & Cohen, K. (2017). Gender classification with data independent features in multiple languages. In European intelligence and security informatics conference (EISIC-2017) (pp. 54–60). Greece: IEEE Computer Society, Athens.
Joulin, A., Grave, E., Bojanowski, P., & Mikolov, T. (2017). Bag of tricks for efficient text classification. In Proceedings of the 15th conference of the european chapter of the association for computational linguistics: Vol. 2, Short Papers (pp. 427–431). Spain: Association for Computational Linguistics, Valencia.
Kim, S.M., Xu, Q., Qu, L., Wan, S., & Paris, C. (2017). Demographic inference on Twitter using recursive neural networks. In Proceedings of ACL-2017, (pp. 471–477). Vancouver, Canada.
Kumar, S., & Zymbler, M. (2019). A machine learning approach to analyze customer satisfaction from airline tweets. Journal of Big Data 6(62). https://doi.org/10.1186/s40537-019-0224-1.
Lennon, C., & Burdick, H. (2014). The lexile framework as an approach for reading measurement and success. Metametrics, durham, north carolina US.
Liu, F., Perez, J., & Nowson, S. (2017). A language-independent and compositional model for personality trait recognition from short texts. In Proceedings of EACL-2017 (pp. 754–764). Spain: Association for Computational Linguistics, Valencia.
Liu, Y., Bian, J., & Agichtein, E. (2008). Predicting information seeker satisfaction in community question answering. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’08. https://doi.org/10.1145/1390334.1390417 (pp. 483–490). USA: ACM.
López-Santillán, R., Montes-Y-Gómez, M., González-Gurrola, L.C., Ramírez-Alonso, G., & Prieto-Ordaz, O. (2020). Richer document embeddings for author profiling tasks based on a heuristic search. Information Processing & Management 57(4). https://doi.org/10.1016/j.ipm.2020.102227.
McLean, G., & Osei-Frimpong, K. (2017). Examining satisfaction with the experience during a live chat service encounter-implications for website providers. Computers in Human Behavior, 76, 494–508. https://doi.org/10.1016/j.chb.2017.08.005.
Article Google Scholar
McNamara, D.S., Graesser, A.C., McCarthy, P.M., & Cai, Z. (2014). Automated evaluation of text and discourse with Coh-Metrix. New York: Cambridge University Press.
Book Google Scholar
McNemar, Q. (1947). Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika, 12(2), 153–157. https://doi.org/10.1007/BF02295996.
Article Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In C.J.C. Burges, L. Bottou, M. Welling, Z. Ghahramani, & K.Q. Weinberger (Eds.) Advances in neural information processing systems 26 (pp. 3111–3119). Curran Associates Inc.
Mikolov, T., Wen-tau, S., & Zweig, G. (2013). Linguistic regularities in continuous space word representations. In Proc. of NAACL-HLT-2013 (pp. 746–751). Atlanta: Association for Computational Linguistics.
Myers, I.B., & Myers, P. (2010). Gifts differing: Understanding personality type. Hachette: Nicholas Brealey Publishing.
Google Scholar
Nguyen, D.P., Trieschnigg, R.B., Dogruoz, A.S., Gravel, R., Theune, M., Meder, T., & de Jong, F.M. (2014). Why gender and age prediction from tweets is hard: Lessons from a crowdsourcing experiment. In Proceedings of COLING-2014 (pp. 1950–1961). Association for Computational Linguistics.
Pardo, F.M.R., Rosso, P., Potthast, M., & Stein, B. (2017). Overview of the 5th author profiling task at PAN 2017: Gender and language variety identification in Twitter. In Working notes of CLEF 2017 - conference and labs of the evaluation forum, (p. 26). Dublin.
Park, K., Kim, J., Park, J., Cha, M., Nam, J., Yoon, S., & Rhim, E. (2015). Mining the minds of customers from online chat logs. In CIKM ’15: Proceedings of the 24th ACM international on conference on information and knowledge management. https://doi.org/10.1145/2806416.2806621 (pp. 1879–1882).
Park, Y., & Gates, S.C. (2009). Towards real-time measurement of customer satisfaction using automatically generated call transcripts. In Proceedings of the 18th ACM conference on information and knowledge management, CIKM ’09. https://doi.org/10.1145/1645953.1646128 (pp. 1387–1396). USA: ACM.
Pennebaker, J.W., Francis, M.E., & Booth, R.J. (2001). Inquiry and word count: LIWC. Lawrence Erlbaum, Mahwah NJ.
Pennington, J., Socher, R., & Manning, C.D. (2014). Glove: Global vectors for word representation. In Proceedings of EMNLP-2014 (pp. 1532–1543).
Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., & Zettlemoyer, L. (2018). Deep contextualized word representations. In Proceedings of the 2018 conference of the north american chapter of the association for computational linguistics: human language technologies, Vol. 1 (Long Papers). https://doi.org/10.18653/v1/N18-1202 (pp. 2227–2237). USA: Association for Computational Linguistics.
Pizarro, J. (2019). Using N-grams to detect Bots on Twitter. In L. Cappellato, N. Ferro, D. Losada, & H. Müller (Eds.) CLEF 2019 Labs and Workshops, Notebook Papers. CEUR-WS.org (p. 10).
Polignano, M., de Gemmis, M., & Semeraro, G. (2020). Contextualized BERT sentence embeddings for author profiling: The cost of performances. In Computational science and its applications (ICCSA)-2020, LNCS 12252. https://doi.org/10.1007/978-3-030-58811-3_10 (pp. 135–149). Cham: Springer.
Preotiuc-Pietro, D., Liu, Y., Hopkins, D., & Ungar, L. (2017). Beyond binary labels: Political ideology prediction of twitter users. In 55th annual meeting of the association for computational linguistics (pp. 729–740). Vancouver: Association for Computational Linguistics.
Price, S., & Hodge, A. (2020). Celebrity profiling using twitter follower feeds. In Working notes of CLEF 2020 - conference and labs of the evaluation forum. CLEF and CEUR-WS.org, thessaloniki, greece.
Rangel, F., Celli, F., Rosso, P., Potthast, M., Stein, B., & Daelemans, W. (2015). Overview of the 3rd Author Profiling Task at PAN 2015. In CLEF 2015 Evaluation labs and workshop, (p. 8). Toulouse, France. CEUR-WS.org.
Rangel, F., & Rosso, P. (2019). Overview of the 7th author profiling task at PAN 2019: Bots and gender profiling. In L. Cappellato, N. Ferro, D. Losada, & H. Müller (Eds.) CLEF 2019 Labs and workshops, notebook papers. CEUR-WS.org (p. 36).
Rangel, F., Rosso, P., Montes-y-Gómez, M., Potthast, M., & Stein, B. (2018). Overview of the 6th Author Profiling Task at PAN 2018: Multimodal Gender Identification in Twitter. In L. Cappellato, N. Ferro, J.Y. Nie, & L. Soulier (Eds.) Working Notes Papers of the CLEF 2018 Evaluation Labs, CEUR Workshop Proceedings. CLEF and CEUR-WS.org (p. 38).
Rangel, F., Rosso, P., Zaghouani, W., & Charfi, A. (2020). Fine-grained analysis of language varieties and demographics. Natural Language Engineering 1–21. https://doi.org/10.1017/S1351324920000108.
Scarton, C.E., & Maria Aluísio, S. (2010). Análise da inteligibilidade de textos via ferramentas de processamento de língua natural: adaptando as métricas do coh-metrix para o português. Linguamá,tica, 2(1), 45–61.
Google Scholar
Silva, B.B.C., & Paraboni, I. (2018). Learning personality traits from Facebook text. IEEE Latin America Transactions, 16(4), 1256–1262. https://doi.org/10.1109/TLA.2018.8362165.
Article Google Scholar
Singh, L.G., & Singh, S.R. (2021). Empirical study of sentiment analysis tools and techniques on societal topics. Journal of Intelligent Information Systems, 56, 379–407. https://doi.org/10.1007/s10844-020-00616-7.
Article Google Scholar
Song, K., Bing, L., Gao, W., Lin, J., Zhao, L., Wang, J., Sun, C., Liu, X., & Zhang, Q. (2019). Using customer service dialogues for satisfaction analysis with context-assisted multiple instance learning. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP). https://doi.org/10.18653/v1/D19-1019 (pp. 198–207). China: Association for Computational Linguistics.
Souza, F., Nogueira, R., & Lotufo, R. (2019). Portuguese named entity recognition using bert-crf. arXiv:1909.10649.
Takahashi, T., Tahara, T., Nagatani, K., Miura, Y., Taniguchi, T., & Ohkuma, T. (2018). Text and image synergy with feature cross technique for gender identification. In Working notes papers of the conference and labs of the evaluation forum (CLEF-2018). Avignon, France, (Vol. 2125 p. 12).
Tang, D., Qin, B., Liu, T., & Yang, Y. (2015). User modeling with neural network for review rating prediction. In Proceedings of the 24th international conference on artificial intelligence, IJCAI’15 (pp. 1340–1346). AAAI Press.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., ukasz Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In I. Guyon, U.V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, & R. Garnett (Eds.) Advances in neural information processing systems 30 (pp. 5998–6008). Curran Associates Inc.
Verhoeven, B., Daelemans, W., & Plank, B. (2016). Twisty: a multilingual Twitter Stylometry corpus for gender and personality profiling. In 10th international conference on language resources and evaluation (LREC-2016) (pp. 1632–1637). Slovenia: ELRA.
Wolpert, D.H. (1992). Stacked generalization. Neural Networks, 5 (2), 241–259.
Article Google Scholar
Yom-Tov, G.B., Ashtar, S., Altman, D., Natapov, M., Barkay, N., Westphal, M., & Rafaeli, A. (2018). Customer sentiment in web-based service interactions: Automated analyses and new insights. In WWW ’18: companion proceedings of the the web conference 2018 (pp. 1689–1698). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, CHE.
Zeng, Z., Luo, C., Shang, L., Li, H., & Sakai, T. (2018). Towards automatic evaluation of customer-helpdesk dialogues. Journal of Information Processing, 26, 768–778. https://doi.org/10.2197/ipsjjip.26.768.
Article Google Scholar
Zhang, L., & Wang, V. (2018). a.B.L.: Deep learning for sentiment analysis: A survey. WIREs Data Mining and Knowledge Discovery, 8(4), e1253.
Article Google Scholar

Download references

Acknowledgements

The authors are grateful to Dr Sidney Evaldo Leal and Dr Sandra Maria Aluísio (USP) for the Coh-Metrix-Port feature extraction, and to Dr Elias Jacob de Menezes Neto (UFRN) for providing us with an early version of the present corpus. We also thank the anonymous reviewers for the valuable input to improve this article.

Funding

The third author received support from the University of São Paulo PRP grant nr. 668/2018.

Author information

Authors and Affiliations

University of São Paulo, Av Arlindo Bettio 1000, São Paulo, Brazil
Arthur Marçal Flores, Matheus Camasmie Pavan & Ivandré Paraboni

Authors

Arthur Marçal Flores
View author publications
You can also search for this author in PubMed Google Scholar
Matheus Camasmie Pavan
View author publications
You can also search for this author in PubMed Google Scholar
Ivandré Paraboni
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Arthur Flores: Conceptualisation, Methodology, Software, Validation, Formal analysis, Investigation, Writing - original draft, reviewing and editing.

Matheus Pavan: User profiling methodology, software, and validation.

Ivandré Paraboni: Conceptualisation, Writing - review and editing, supervision, project, fund acquisition.

Corresponding author

Correspondence to Ivandré Paraboni.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Flores, A.M., Pavan, M.C. & Paraboni, I. User profiling and satisfaction inference in public information access services. J Intell Inf Syst 58, 67–89 (2022). https://doi.org/10.1007/s10844-021-00661-w

Download citation

Received: 24 January 2021
Revised: 09 July 2021
Accepted: 11 July 2021
Published: 04 August 2021
Issue Date: February 2022
DOI: https://doi.org/10.1007/s10844-021-00661-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

User profiling and satisfaction inference in public information access services

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

Advances in Social Media Research: Past, Present and Future

Sentiment Analysis in the Age of Generative AI

Availability of Data and Material

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

User profiling and satisfaction inference in public information access services

Abstract

Access this article

Similar content being viewed by others

A survey on sentiment analysis methods, applications, and challenges

Advances in Social Media Research: Past, Present and Future

Sentiment Analysis in the Age of Generative AI

Availability of Data and Material

Code Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation