RSentiment: A Tool to Extract Meaningful Insights from Textual Reviews

  • Subhasree Bose
  • Urmi Saha
  • Debanjana Kar
  • Saptarsi Goswami
  • Amlan Kusum Nayak
  • Satyajit Chakrabarti
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 516)

Abstract

Every system needs continuous improvement. Feedback from different stakeholders plays a crucial role here. From literature study, the need of textual feedback analysis for an academic institute is well established. In fact, it has been perceived that often a textual feedback is more informative, more open ended and more effective in producing actionable insights to decision makers as compared to more common score based (on a scale from 1: n) feedback. However, getting this information from textual feedback is not possible through the traditional means of data analysis. Here we have conceptualized a tool, which can apply text mining techniques to elicit insights from textual data and has been published as an open source package for a broader use by practitioners. Appropriate visualization techniques are applied for intuitive understanding of the insights. For this, we have used a real dataset consisting of alumni feedback from a top engineering college in Kolkata.

Keywords

Textual feedback Sentiment analysis Topic models 

1 Introduction

Growth of textual data has been on the rise for last few years. Some examples of textual data are reviews or feedback, emails, chat or transcripts, tweets, blogs etc. Feedback and reviews have been effectively used for many significant insights. It is critical for any dynamic system or process. The feedback can be quantitative where users are asked to rate on a given scale of (say) 1 to 5. A specific question can be asked and the participant will be asked to choose options like Strongly Agree, Somewhat Agree, Neither Agree nor Disagree, Somewhat disagree, Strongly Disagree. On the other hand, it can be qualitative where feedback about a particular area or areas of interest may be asked for. A sentiment analysis strategy on the qualitative data can also give enough actionable intelligence. The quantitative feedback can alternatively be thought as the structured data, whereas the qualitative feedback represents unstructured data.

Undoubtedly having quantitative feedback has its own advantages. In paper [1], some of the shortcomings pointed out by authors on quantitative feedback are (i) Good survey questions are hard to write (ii) data may provide a generic picture but lacks depth. Strictly, as far as an academic sector is concerned, from our own experience it was felt that, a question on infrastructure may fetch a score of 3 out of 5, however it fails to reveal anything actionable. On the contrary, a qualitative feedback has been received as “Labs and Classrooms are okay. More flexible Library timings would help. Maybe some Cloud setup for doing complex and memory intensive/CPU intensive jobs.” So it can be well understood that, the user is giving feedback on laboratories, classrooms and libraries. Now, the challenge is tough, the textual feedback is more informative, finding the aspects or features and then attaching a score is not trivial. These challenges motivate our current work. As observed in [2], a fixed questionnaire actually limits the user’s capacity to give feedback, because there may be various other aspects outside the questionnaire on which user has an opinion or a feedback.

Textual feedback has been successfully used in various domains, mostly in assessing customer feedback. In [3], the authors have described a tool with interactive visualization for hotel customers. Article [4] is about tourism industry. As acknowledged in [2], feedback is extremely important for academic institutes in maintaining quality. Text mining has been applied for tracking feedback for online learning systems, to extract concepts from learning materials [5] and also for teacher evaluation. In education as well, there are various kinds of feedback, which are collected from different stakeholders like industry practitioners, faculties, students, parents, alumni. Alumni forms a very important part of any academic institute’s ecosystem. In this paper, alumni feedback in terms of textual remarks has been collected on various topics like alumni interaction, placements, infrastructure, academic discipline, faculty, extracurricular activities, focus on R&D and focus on entrepreneurship. These are more insightful than mere quantitative feedback. With the help of topic modeling coupled with sentiment analysis, different finer aspects of each of the areas are identified. Finally, simple visualization techniques are used to interpret the results. However, as it is textual data, there might be semantic ambiguity as well as some spelling errors.

The unique contributions of the paper are as follows:
  • The model is available as an open source package for use by all [6].

  • Produces better results than reference methods.

  • The data are real data and hence different engineering techniques to handle negation and the degree of sentiments are discussed.

  • The process is generically built, so for any textual data, this can be extended.

The rest of the paper is organized as follows. In Sect. 2, we have discussed the related works in this area. In Sect. 3, proposed methodology is presented. In Sect. 4, different parameters of experimental setup are covered. In Sect. 5, the results are presented with necessary analysis. Section 6 contains the conclusion.

2 Related Works

In [7], authors have discussed a technique for evaluating the teachers’ feedback sent over SMS. It consists of standard natural language processing (NLP) steps like Part of Speech Tagging (POS), Named Entity Recognition (NER), and Stemming etc. SMS also has the challenge of misspelled or abbreviated words. Different concepts were extracted and a sentiment analysis is performed. There is a strong need of identifying the hidden topics or sub themes in the feedback. In paper [1], authors have proposed an ontology based solution for this problem. Mathias et al. [8] have presented their findings from large corpora of feedback about a specific course on industrial design. In [2], authors have collected feedback about teachers in running text. Usual preprocessing steps are done, followed by aspect identification and then sentiment analysis is applied. The text based method appears to extract feedback about much more features which were not generally captured by a numeric score based system. In the paper [9], authors have worked with MOOC Comments. Apart from traditional text preprocessing and sentiment analysis, a correlation analysis between various sentiments is also performed. In paper [10], the authors have demonstrated how an NLP (Natural Language Processing) based system can quickly identify major areas of concerns in an e-learning system, without having the need of going through volumes of survey data. It may be noted that many of the research works as mentioned above are quite recent and are from the years 2014 and 2015, which signifies the relevance of current work.

3 Proposed Methodology

In this section, the proposed methodology in terms of the sequential steps has been elaborated.

Data Collection: Data from alumni have been collected using Google Forms. The form invited textual feedback about various dimensions like alumni interaction, placements, infrastructure, academic discipline, faculty, extracurricular, focus on R&D and focus on Entrepreneurship. These aspects are refereed as the academic dimensions in subsequent discussions.

Pre Processing: Standard techniques like whitespace, punctuation removal, spelling correction have been performed for sentiment analysis. For the sub dimension sentiment analysis, techniques like normalization, stemming, stop word removal have been performed for topic modeling.

Topic Modeling: Various sub themes, or aspects from each dimension are extracted using topic modeling. It is a standard process in text mining, where the hidden semantic structure of the text can be discovered. For a detail understanding of topic model, the paper [11] can be referred.

Sentiment Analysis: Sentiment analysis is performed at two levels. At level 1, sentiment analysis is performed directly on the academic dimensions. At level 2, it is performed at sub dimensions, as extracted or discovered through topic modeling. The sentiment analysis used a lexicon based method detailed in Sect. 4. Simple strategies have been adopted for handling negation and sarcasm.

Visualization: Simple visualizations like work clouds for identifying the topic discussed in an academic dimension and bar charts to depict the score range of sentiment are suggested.

The diagrammatic representation is shown in Fig. 1.
Fig. 1

Diagrammatic representation of proposed methodology

The entire methodology is available in a package named as ‘RSentiment’ [6] in the Comprehensive R Archive Network (CRAN).

4 Experimental Setup

The feedback was collected using Google Forms and at the time of the analysis the numbers of respondents were well over 60. This was conducted in an anonymous fashion for better inputs. ‘R’ [12] is used as the computational environment. For list of positive and negative words, works of Liu et al. [13] was used. This is an exhaustive list with 2000+ positive and 4000+ negative words. The intensity has been assumed to increase from positive to very positive or negative to very negative with the use of superlative or cooperative words. A sentence or a phrase is neutral, if it does not contain any positive or negative words. Negations are treated in a simple lexicon based manner on the basis of occurrence of certain words or punctuations. For level 2, topic modelling is done after stop word removal and stemming. The most discussed topics were identified and with each topic, we extracted the feedback focusing on these topics in that academic dimension, and applied sentiment analysis algorithm to find the sentiments of the feedback on that topic.

For preprocessing packages ‘plyr’ [14] and ‘stringr’ [15] have been used. For topic modeling “tm” [16] has been used. For visualization, “Shiny” [17] and WordCloud [18] have been used respectively.

5 Results and Discussion

The result section is organized in three sub sections. In the first section, we have chosen one academic dimension and analyzed the various feedbacks we received regarding it. We identified the total number of sentences or feedback under each category of sentiment, assigned scores to each sentence and plotted a graph and presented a word cloud to identify the most discussed topics in this area. In the second section, we chose one academic dimension and analyzed sentiment on each of the most discussed topics of that dimension to get some valuable insights on that dimension. In the last section, we have given comparison with some other open source reference methods.

5.1 Overall Sentiment

In this section, we have chosen one academic dimension say ‘faculty’ or ‘placement’ and analyzed the feedback on that dimension. The result of the sentiment analysis on dimensions ‘faculty’ and ‘placement’ are shown in Table 1.
Table 1

Sentiment analysis result for academic dimension ‘faculty’ and ‘placement’

Sentiment category

Number of feedback (faculty)

Number of feedback (placement)

Very negative

4

4

Negative

6

6

Neutral

13

21

Positive

22

20

Very positive

19

13

From the above table, we can see sentiment analysis of academic dimensions ‘faculty’ and ‘placement’. The above dimensions got, 6 into negative and 4 in the very negative category. ‘Faculty’ got 13 feedbacks in neutral while ‘placement’ got 21 in that category. There are 22 positive feedbacks in dimension ‘faculty’ while ‘placement has got 20 positive feedbacks. In the very positive category, ‘faculty’ got 19 feedbacks while ‘placement’ got 13. We thus obtain an overall sentiment of the academic dimensions of the concerned college and to demonstrate the same we have used the following visualization techniques:
  1. i.
    With the scores assigned to these feedbacks, the following plots have been generated. Figures 2a and 3a show the graphical representation of the number of feedbacks in each assigned score category of the two dimensions mentioned above—‘faculty’ and ‘placement’ respectively.
    Fig. 2

    a Histogram with number of feedback on dimension ‘faculty’ against the scores of feedback. b Wordcloud generated by feedback of dimension ‘faculty’

    Fig. 3

    a Histogram with number of feedback on dimension ‘placement’ against the scores of feedback. b Wordcloud generated by feedback of dimension ‘placement’

     
  2. ii.

    Figures 2b and 3b show the respective word clouds of ‘faculty’ and ‘placement’ generated to highlight the sub-topics mentioned in the feedback text, thus gaining a proper insight of the areas which are of high concern in these academic dimensions.

     

An analysis of total positive opinions as a percentage of total opinions (Removing neutral) was performed. Alumni Interaction, Placements, Infrastructure, Academic Discipline, Faculty, Extracurricular, Focus on R & D, Focus on entrepreneurship gets 48 %, 77 %, 71 %, 61 %,80 %, 38 %, 50 % and 54 % respectively. This can allow the institute management an insight into the strength and weak academic dimensions very conveniently.

5.2 Sentiment with Topic Modeling

In this section, we have selected one academic dimension—‘Infrastructure’ and divided it into the various sub-topics obtained by applying the method of topic modelling on the dimension ‘Infrastructure’.

We considered the most discussed topics by observing the frequency of their occurrence in the feedback and analyzed them to find sentiment on this topics. Table 2 shows the tabular structure representing the correlation between the sub-topics and the different sentiment categories.
Table 2

Correlation between sub-topics of the dimension ‘infrastructure’ and the six sentiment categories

 

Very negative

Negative

Neutral

Positive

Very positive

Lab

2.00

0.00

2.00

1.00

5.00

Classroom

1.00

0.00

0.00

0.00

2.00

Library

0.00

0.00

0.00

0.00

0.00

Seminar

1.00

0.00

0.00

3.00

2.00

Computer

1.00

0.00

1.00

0.00

4.00

In the above table, we can see that lab is most discussed topic and there are 2 neutral, 1 positive, 2 very negative and 5 very positive feedback. Thus we gain a detailed insight of the academic dimensions and each of its sub-topics extracted from the feedback text by the method of topic modeling. Further analysis of extracurricular activities shows that the technical fests and cultural fests have received relatively less negative remarks compared to sports.

5.3 Comparison with Other Methods

The results were compared with QDAP [19] and Sentiment [20]. In Table 3, we have provided with some comparisons (the sentences for which the polarity values did not match) over phrases from the alumni interaction academic dimension.
Table 3

Comparison with QDAP

Phrases

Qdap package

Our method

Very less

0

−2

We got very less scope to interact with our seniors

0

−2

No interaction

0

−1

Has become very minimal

0

−2

Very less to none

0

−2

Absolutely nil

0

−2

It is observed that the proposed methodology can correctly classify the polarity of the phrases. Especially with sentences which have negations, our method seems more effective. We have also compared with Sentiment [20], however it classifies a phrase as only positive or negative.

6 Conclusion

It is well established, that textual feedback is very critical for any academic institute and often produces superior insights than quantitative feedback. In this paper, a methodology has been proposed to perform sentiment analysis in conjunction with topic modeling on textual feedback. The proposed methodology is available as an open source package [6] in the Comprehensive R Archive Network (CRAN). The efficacy of this method is tested over a good amount of alumni feedback collected using Google forms. Our proposed method, firstly can identify how the institute is doing in each of the academic dimensions, secondly with the help of topic modeling, it can identify the various sub dimensions and sentiment in each of them. As an example, the tool automatically identified various areas under infrastructure, namely lab, classroom, seminar facility, library and computers. Some engineering improvements are applied to handle negation. The results in terms of visualization can be proved to be quite beneficial to authorities. Particularly, it may be noted, statements like ‘no interaction’, ‘has become very minimal’ may often be classified incorrectly by current open source tools. As an extension, we intend to test with bigger corpora of feedback and compare with more state of the art techniques.

References

  1. 1.
    P.K. Agrawal, A.S. Alvi, Textual feedback analysis: review, in International Conference in Computing Communication Control and Automation (ICCUBEA) pp. 457–460Google Scholar
  2. 2.
    A. Kumar, R. Jain, Sentiment analysis and feedback evaluation, in 2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE). (IEEE, 2015), pp. 433–436Google Scholar
  3. 3.
    Y. Wu, F. Wei, S. Liu, N. Au, W. Cui, H. Zhou, H. Qu, OpinionSeer: interactive visualization of hotel customer feedback. Visual. Comput. Graph. IEEE Trans. 16(6), 1109–1118 (2010)CrossRefGoogle Scholar
  4. 4.
    S.H. Liao, Y.J. Chen, M.Y. Deng, Mining customer knowledge for tourism new product development and customer relationship management. Expert Syst. Appl. 37(6), 4212–4223 (2010)CrossRefGoogle Scholar
  5. 5.
    C. Romero, S. Ventura, Educational data mining: a review of the state of the art. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 40(6), 601–618 (2010)CrossRefGoogle Scholar
  6. 6.
    S. Bose, RSentiment: Analyse Sentiment of English Sentences. Rpackage version 1.0.4 (2016). https://CRAN.R-project.org/package=RSentiment
  7. 7.
    C.K. Leong, Y.H. Lee, W.K. Mak, Mining sentiments in SMS texts for teaching evaluation. Expert Syst. Appl. 39(3), 2584–2589 (2012)CrossRefGoogle Scholar
  8. 8.
    M. Funk, M. van Diggelen, Feeding a monster or doing good? Mining industrial design student feedback at large, in Proceedings of the 2014 Workshop on Interaction Design in Educational Environments (ACM, 2014), p. 59Google Scholar
  9. 9.
    B.K.P. Conrad, A. Divinsky, Mining student-generated textual data in moocs and quantifying their effects on student performance and learning outcomes, in 2014 ASEE Annual Conference, Indianapolis, Indiana (2014)Google Scholar
  10. 10.
    W.-B. Yu, R. Luna, Exploring user feedback of a e-learning system: a text mining approach, in Human Interface and the Management of Information. Information and Interaction for Learning, Culture, Collaboration and Business (Springer Berlin Heidelberg, 2013), pp. 182–191Google Scholar
  11. 11.
    M.W. Hanna, Topic modeling: beyond bag-of-words, in Proceedings of the 23rd international conference on Machine learning (ACM, 2006), pp. 977–984Google Scholar
  12. 12.
    R Core Team, A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2013). ISBN 3-900051-07-0, http://www.R-project.org/
  13. 13.
    B. Liu, M. Hu, J. Cheng, Opinion observer: analyzing and comparing opinions on the web, in Proceedings of the 14th International World Wide Web conference (WWW-2005), 10−14 May 2005, Chiba, JapanGoogle Scholar
  14. 14.
    H. Wickham, The Split-apply-combine strategy for data analysis. J. Stat. Softw. 40(1), 1–29 (2011), http://www.jstatsoft.org/v40/i01/
  15. 15.
    H. Wickham, stringr: Simple, consistent wrappers for common string operations. R package version 1.0.0 (2015), https://CRAN.R-project.org/package=stringr
  16. 16.
    I. Feinerer, K. Hornik, D. Meyer, Text mining infrastructure. R. J. Stat. Softw. 25(5), 1–54 (2008), http://www.jstatsoft.org/v25/i05/
  17. 17.
    W. Chang, J. Cheng, J.J. Allaire, Y. Xie, J. McPherson, shiny: Web Application Framework for R. R package version 0.13.1 (2016), https://CRAN.R-project.org/package=shiny
  18. 18.
    Ian Fellows, wordcloud: Word Clouds. R package version 2.5 (2014), https://CRAN.R-project.org/package=wordcloud
  19. 19.
    T.W. Rinker, qdap: Quantitative Discourse Analysis Package. 2.2.4 (University at Buffalo. Buffalo, New York, 2013), http://github.com/trinker/qdap
  20. 20.
    T.P. Jurka, sentiment: Tools for sentiment analysis. R package version 0.2 (2012), https://CRAN.R-project.org/package=sentiment

Copyright information

© Springer Nature Singapore Pte Ltd. 2017

Authors and Affiliations

  • Subhasree Bose
    • 1
  • Urmi Saha
    • 1
  • Debanjana Kar
    • 1
  • Saptarsi Goswami
    • 1
  • Amlan Kusum Nayak
    • 1
  • Satyajit Chakrabarti
    • 1
  1. 1.Institute of Engineering and ManagementKolkataIndia

Personalised recommendations