RSentiment: A Tool to Extract Meaningful Insights from Textual Reviews
Every system needs continuous improvement. Feedback from different stakeholders plays a crucial role here. From literature study, the need of textual feedback analysis for an academic institute is well established. In fact, it has been perceived that often a textual feedback is more informative, more open ended and more effective in producing actionable insights to decision makers as compared to more common score based (on a scale from 1: n) feedback. However, getting this information from textual feedback is not possible through the traditional means of data analysis. Here we have conceptualized a tool, which can apply text mining techniques to elicit insights from textual data and has been published as an open source package for a broader use by practitioners. Appropriate visualization techniques are applied for intuitive understanding of the insights. For this, we have used a real dataset consisting of alumni feedback from a top engineering college in Kolkata.
KeywordsTextual feedback Sentiment analysis Topic models
Growth of textual data has been on the rise for last few years. Some examples of textual data are reviews or feedback, emails, chat or transcripts, tweets, blogs etc. Feedback and reviews have been effectively used for many significant insights. It is critical for any dynamic system or process. The feedback can be quantitative where users are asked to rate on a given scale of (say) 1 to 5. A specific question can be asked and the participant will be asked to choose options like Strongly Agree, Somewhat Agree, Neither Agree nor Disagree, Somewhat disagree, Strongly Disagree. On the other hand, it can be qualitative where feedback about a particular area or areas of interest may be asked for. A sentiment analysis strategy on the qualitative data can also give enough actionable intelligence. The quantitative feedback can alternatively be thought as the structured data, whereas the qualitative feedback represents unstructured data.
Undoubtedly having quantitative feedback has its own advantages. In paper , some of the shortcomings pointed out by authors on quantitative feedback are (i) Good survey questions are hard to write (ii) data may provide a generic picture but lacks depth. Strictly, as far as an academic sector is concerned, from our own experience it was felt that, a question on infrastructure may fetch a score of 3 out of 5, however it fails to reveal anything actionable. On the contrary, a qualitative feedback has been received as “Labs and Classrooms are okay. More flexible Library timings would help. Maybe some Cloud setup for doing complex and memory intensive/CPU intensive jobs.” So it can be well understood that, the user is giving feedback on laboratories, classrooms and libraries. Now, the challenge is tough, the textual feedback is more informative, finding the aspects or features and then attaching a score is not trivial. These challenges motivate our current work. As observed in , a fixed questionnaire actually limits the user’s capacity to give feedback, because there may be various other aspects outside the questionnaire on which user has an opinion or a feedback.
Textual feedback has been successfully used in various domains, mostly in assessing customer feedback. In , the authors have described a tool with interactive visualization for hotel customers. Article  is about tourism industry. As acknowledged in , feedback is extremely important for academic institutes in maintaining quality. Text mining has been applied for tracking feedback for online learning systems, to extract concepts from learning materials  and also for teacher evaluation. In education as well, there are various kinds of feedback, which are collected from different stakeholders like industry practitioners, faculties, students, parents, alumni. Alumni forms a very important part of any academic institute’s ecosystem. In this paper, alumni feedback in terms of textual remarks has been collected on various topics like alumni interaction, placements, infrastructure, academic discipline, faculty, extracurricular activities, focus on R&D and focus on entrepreneurship. These are more insightful than mere quantitative feedback. With the help of topic modeling coupled with sentiment analysis, different finer aspects of each of the areas are identified. Finally, simple visualization techniques are used to interpret the results. However, as it is textual data, there might be semantic ambiguity as well as some spelling errors.
The model is available as an open source package for use by all .
Produces better results than reference methods.
The data are real data and hence different engineering techniques to handle negation and the degree of sentiments are discussed.
The process is generically built, so for any textual data, this can be extended.
The rest of the paper is organized as follows. In Sect. 2, we have discussed the related works in this area. In Sect. 3, proposed methodology is presented. In Sect. 4, different parameters of experimental setup are covered. In Sect. 5, the results are presented with necessary analysis. Section 6 contains the conclusion.
2 Related Works
In , authors have discussed a technique for evaluating the teachers’ feedback sent over SMS. It consists of standard natural language processing (NLP) steps like Part of Speech Tagging (POS), Named Entity Recognition (NER), and Stemming etc. SMS also has the challenge of misspelled or abbreviated words. Different concepts were extracted and a sentiment analysis is performed. There is a strong need of identifying the hidden topics or sub themes in the feedback. In paper , authors have proposed an ontology based solution for this problem. Mathias et al.  have presented their findings from large corpora of feedback about a specific course on industrial design. In , authors have collected feedback about teachers in running text. Usual preprocessing steps are done, followed by aspect identification and then sentiment analysis is applied. The text based method appears to extract feedback about much more features which were not generally captured by a numeric score based system. In the paper , authors have worked with MOOC Comments. Apart from traditional text preprocessing and sentiment analysis, a correlation analysis between various sentiments is also performed. In paper , the authors have demonstrated how an NLP (Natural Language Processing) based system can quickly identify major areas of concerns in an e-learning system, without having the need of going through volumes of survey data. It may be noted that many of the research works as mentioned above are quite recent and are from the years 2014 and 2015, which signifies the relevance of current work.
3 Proposed Methodology
In this section, the proposed methodology in terms of the sequential steps has been elaborated.
Data Collection: Data from alumni have been collected using Google Forms. The form invited textual feedback about various dimensions like alumni interaction, placements, infrastructure, academic discipline, faculty, extracurricular, focus on R&D and focus on Entrepreneurship. These aspects are refereed as the academic dimensions in subsequent discussions.
Pre Processing: Standard techniques like whitespace, punctuation removal, spelling correction have been performed for sentiment analysis. For the sub dimension sentiment analysis, techniques like normalization, stemming, stop word removal have been performed for topic modeling.
Topic Modeling: Various sub themes, or aspects from each dimension are extracted using topic modeling. It is a standard process in text mining, where the hidden semantic structure of the text can be discovered. For a detail understanding of topic model, the paper  can be referred.
Sentiment Analysis: Sentiment analysis is performed at two levels. At level 1, sentiment analysis is performed directly on the academic dimensions. At level 2, it is performed at sub dimensions, as extracted or discovered through topic modeling. The sentiment analysis used a lexicon based method detailed in Sect. 4. Simple strategies have been adopted for handling negation and sarcasm.
Visualization: Simple visualizations like work clouds for identifying the topic discussed in an academic dimension and bar charts to depict the score range of sentiment are suggested.
The entire methodology is available in a package named as ‘RSentiment’  in the Comprehensive R Archive Network (CRAN).
4 Experimental Setup
The feedback was collected using Google Forms and at the time of the analysis the numbers of respondents were well over 60. This was conducted in an anonymous fashion for better inputs. ‘R’  is used as the computational environment. For list of positive and negative words, works of Liu et al.  was used. This is an exhaustive list with 2000+ positive and 4000+ negative words. The intensity has been assumed to increase from positive to very positive or negative to very negative with the use of superlative or cooperative words. A sentence or a phrase is neutral, if it does not contain any positive or negative words. Negations are treated in a simple lexicon based manner on the basis of occurrence of certain words or punctuations. For level 2, topic modelling is done after stop word removal and stemming. The most discussed topics were identified and with each topic, we extracted the feedback focusing on these topics in that academic dimension, and applied sentiment analysis algorithm to find the sentiments of the feedback on that topic.
5 Results and Discussion
The result section is organized in three sub sections. In the first section, we have chosen one academic dimension and analyzed the various feedbacks we received regarding it. We identified the total number of sentences or feedback under each category of sentiment, assigned scores to each sentence and plotted a graph and presented a word cloud to identify the most discussed topics in this area. In the second section, we chose one academic dimension and analyzed sentiment on each of the most discussed topics of that dimension to get some valuable insights on that dimension. In the last section, we have given comparison with some other open source reference methods.
5.1 Overall Sentiment
Sentiment analysis result for academic dimension ‘faculty’ and ‘placement’
Number of feedback (faculty)
Number of feedback (placement)
An analysis of total positive opinions as a percentage of total opinions (Removing neutral) was performed. Alumni Interaction, Placements, Infrastructure, Academic Discipline, Faculty, Extracurricular, Focus on R & D, Focus on entrepreneurship gets 48 %, 77 %, 71 %, 61 %,80 %, 38 %, 50 % and 54 % respectively. This can allow the institute management an insight into the strength and weak academic dimensions very conveniently.
5.2 Sentiment with Topic Modeling
In this section, we have selected one academic dimension—‘Infrastructure’ and divided it into the various sub-topics obtained by applying the method of topic modelling on the dimension ‘Infrastructure’.
Correlation between sub-topics of the dimension ‘infrastructure’ and the six sentiment categories
In the above table, we can see that lab is most discussed topic and there are 2 neutral, 1 positive, 2 very negative and 5 very positive feedback. Thus we gain a detailed insight of the academic dimensions and each of its sub-topics extracted from the feedback text by the method of topic modeling. Further analysis of extracurricular activities shows that the technical fests and cultural fests have received relatively less negative remarks compared to sports.
5.3 Comparison with Other Methods
Comparison with QDAP
We got very less scope to interact with our seniors
Has become very minimal
Very less to none
It is observed that the proposed methodology can correctly classify the polarity of the phrases. Especially with sentences which have negations, our method seems more effective. We have also compared with Sentiment , however it classifies a phrase as only positive or negative.
It is well established, that textual feedback is very critical for any academic institute and often produces superior insights than quantitative feedback. In this paper, a methodology has been proposed to perform sentiment analysis in conjunction with topic modeling on textual feedback. The proposed methodology is available as an open source package  in the Comprehensive R Archive Network (CRAN). The efficacy of this method is tested over a good amount of alumni feedback collected using Google forms. Our proposed method, firstly can identify how the institute is doing in each of the academic dimensions, secondly with the help of topic modeling, it can identify the various sub dimensions and sentiment in each of them. As an example, the tool automatically identified various areas under infrastructure, namely lab, classroom, seminar facility, library and computers. Some engineering improvements are applied to handle negation. The results in terms of visualization can be proved to be quite beneficial to authorities. Particularly, it may be noted, statements like ‘no interaction’, ‘has become very minimal’ may often be classified incorrectly by current open source tools. As an extension, we intend to test with bigger corpora of feedback and compare with more state of the art techniques.
- 1.P.K. Agrawal, A.S. Alvi, Textual feedback analysis: review, in International Conference in Computing Communication Control and Automation (ICCUBEA) pp. 457–460Google Scholar
- 2.A. Kumar, R. Jain, Sentiment analysis and feedback evaluation, in 2015 IEEE 3rd International Conference on MOOCs, Innovation and Technology in Education (MITE). (IEEE, 2015), pp. 433–436Google Scholar
- 6.S. Bose, RSentiment: Analyse Sentiment of English Sentences. Rpackage version 1.0.4 (2016). https://CRAN.R-project.org/package=RSentiment
- 8.M. Funk, M. van Diggelen, Feeding a monster or doing good? Mining industrial design student feedback at large, in Proceedings of the 2014 Workshop on Interaction Design in Educational Environments (ACM, 2014), p. 59Google Scholar
- 9.B.K.P. Conrad, A. Divinsky, Mining student-generated textual data in moocs and quantifying their effects on student performance and learning outcomes, in 2014 ASEE Annual Conference, Indianapolis, Indiana (2014)Google Scholar
- 10.W.-B. Yu, R. Luna, Exploring user feedback of a e-learning system: a text mining approach, in Human Interface and the Management of Information. Information and Interaction for Learning, Culture, Collaboration and Business (Springer Berlin Heidelberg, 2013), pp. 182–191Google Scholar
- 11.M.W. Hanna, Topic modeling: beyond bag-of-words, in Proceedings of the 23rd international conference on Machine learning (ACM, 2006), pp. 977–984Google Scholar
- 12.R Core Team, A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2013). ISBN 3-900051-07-0, http://www.R-project.org/
- 13.B. Liu, M. Hu, J. Cheng, Opinion observer: analyzing and comparing opinions on the web, in Proceedings of the 14th International World Wide Web conference (WWW-2005), 10−14 May 2005, Chiba, JapanGoogle Scholar
- 14.H. Wickham, The Split-apply-combine strategy for data analysis. J. Stat. Softw. 40(1), 1–29 (2011), http://www.jstatsoft.org/v40/i01/
- 15.H. Wickham, stringr: Simple, consistent wrappers for common string operations. R package version 1.0.0 (2015), https://CRAN.R-project.org/package=stringr
- 16.I. Feinerer, K. Hornik, D. Meyer, Text mining infrastructure. R. J. Stat. Softw. 25(5), 1–54 (2008), http://www.jstatsoft.org/v25/i05/
- 17.W. Chang, J. Cheng, J.J. Allaire, Y. Xie, J. McPherson, shiny: Web Application Framework for R. R package version 0.13.1 (2016), https://CRAN.R-project.org/package=shiny
- 18.Ian Fellows, wordcloud: Word Clouds. R package version 2.5 (2014), https://CRAN.R-project.org/package=wordcloud
- 19.T.W. Rinker, qdap: Quantitative Discourse Analysis Package. 2.2.4 (University at Buffalo. Buffalo, New York, 2013), http://github.com/trinker/qdap
- 20.T.P. Jurka, sentiment: Tools for sentiment analysis. R package version 0.2 (2012), https://CRAN.R-project.org/package=sentiment