Skip to main content

The JDPA Sentiment Corpus for the Automotive Domain

  • Chapter
  • First Online:
Handbook of Linguistic Annotation

Abstract

This chapter presents a rich annotation scheme for mentions, co-reference, meronymy, sentiment expressions, modifiers of sentiment expressions including neutralizers, negators, and intensifiers, and describes a large corpus annotated with this scheme. We define the various annotation types, provide examples, and show statistics on occurrence and inter-annotator agreement. This resource is the largest sentiment-topical corpus to date and is publicly available. It helps quantify sentiment phenomena, and allows for the construction of advanced sentiment systems and enables direct comparison of different algorithms.

Work was conducted while both authors were at, J.D. Power and Associates Web Intelligence, McGraw Hill.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We draw the distinction between the immediate target of a sentiment expression and a document-level topic. Other work, such as [27], has addressed the problem of developing topic-dependent feature-sets for supervised classification of document-level polarity.

  2. 2.

    Called “negatives” in [29].

  3. 3.

    The TimeML corpus [30] has explicit annotations for counter-factive events and treats negation as a property of an event. We believe that both act the same way w.r.t. contextual polarity.

  4. 4.

    Reference [31] presents a corpus containing “certainty markers”, or expressions indicating commitment to a sentence or a clause and its level of certainty, on a scale from uncertain through absolute certainty. Our committers are judged on a binary scale: do they raise or lower the author’s commitment to a sentiment expression or modification.

  5. 5.

    The problem of determining when an event is asserted as true, false or unknown truth-value is called veridicity [16]. [18] has developed a rule-based system for recognizing the veridicity of some clauses which is tailored to the blogosphere and has released a lexicon which includes “neutral veridicality elements” which neutralize their argument clauses.

  6. 6.

    Discussion of descriptors is omitted due to space constraints. See the annotation guidelines [10] for details about this annotation.

References

  1. Asher, N., Benamara, F., Mathieu, Y.Y.: Distilling opinion in discourse: a preliminary study. In: Coling 2008: Companion volume: Posters, pp. 7–10, Coling Organizing Committee, Manchester, UK (2008)

    Google Scholar 

  2. Bloom, K.: Sentiment analysis based on appraisal theory an functional local grammars. Ph.D. Dissertation, Illinois Institute of Technology (2011)

    Google Scholar 

  3. Breck, E., Cardie, C.: Playing the telephone game: determining the hierarchical structure of perspective and speech expressions. In: COLING (2004)

    Google Scholar 

  4. Breck, E., Choi, Y., Cardie, C.: Identifying expressions of opinion in context. In: IJCAI (2007)

    Google Scholar 

  5. Brown, G.I.: An error analysis of relation extraction in social media documents. Proceedings of the ACL 2011 Student Session. HLT-SS ’11, pp. 64–68. Association for Computational Linguistics, Stroudsburg, PA, USA (2011)

    Google Scholar 

  6. Choi, Y., Cardie, C.: Learning with compositional semantics as structural inference for subsentential sentiment analysis. In: EMNLP (2008)

    Google Scholar 

  7. Choi, Y., Kim, Y. Myaeng, S.-H.: Domain-specific sentiment analysis using contextual feature generation. In: TSA (2009)

    Google Scholar 

  8. Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20(1), 37–46 (1960)

    Article  Google Scholar 

  9. Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: WSDM (2008)

    Google Scholar 

  10. Eckert, M., Clark, L., Lind, H., Kessler, J., Nicolov, N.: Structural sentiment and entity annotation guidelines. J. D, Power and Associates Technical Report (2010)

    Google Scholar 

  11. Fahrni, A., Klenner, M.: Old wine or warm beer: target-specific sentiment analysis of adjectives. In: AISB (2008)

    Google Scholar 

  12. Ginsca, A.-L.: Fine-grained opinion mining as a relation classification problem. In: Jones A.V. (ed.) ICCSW. OASICS, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, vol. 28, pp. 56–61. Germany (2012)

    Google Scholar 

  13. Girju, R., Badulescu, A., Moldovan, D.: Automatic discovery of part-whole relations. Comput. Linguist. 32(1), 83–135 (2006)

    Google Scholar 

  14. Hu, M., Liu, B.: Mining and summarizing customer reviews. In: KDD (2004)

    Google Scholar 

  15. Jbara, A.A.: Using natural language processing to mine multiple perspectives from social media and scientific literature. Ph.D. Dissertation, The University of Michigan (2013)

    Google Scholar 

  16. Karttunen, L., Zaenen, A.: Veridicity. In: Annotating, extracting and reasoning about time and events (2005)

    Google Scholar 

  17. Kessler, W., Kuhn, J.: Detection of product comparisons - how far does an out-of-the-box semantic role labeling system take you?. In: EMNLP, pp. 1892–1897. ACL (2013)

    Google Scholar 

  18. Kessler, J.S.: Polling the blogosphere: a Rule-Based approach to belief classification. In: ICWSM (2008)

    Google Scholar 

  19. Kessler, J.S., Nicolov, N.: Targeting sentiment expressions through supervised ranking of linguistic configurations. In: ICWSM (2009)

    Google Scholar 

  20. Kessler, J.S., Eckert, M., Clark, L., Nicolov, N.: The 2010 ICWSM JDPA sentiment corpus for the automotive domain. In: CWSM-DWC (2010)

    Google Scholar 

  21. Kim, S.-M., Hovy, E.: Determining the sentiment of opinions. In: COLING (2004)

    Google Scholar 

  22. Kim, S.-M., Hovy, E.: Extracting opinions, opinion holders, and topics expressed in online news media text. In: ACL Workshop on sentiment and subjectivity in text (2006)

    Google Scholar 

  23. Krestel, R., Witte, R., Bergler, S.: Minding the source: automatic tagging of reported speech in newspaper articles. In: LREC (2008)

    Google Scholar 

  24. Moilanen, K., Pulman, S.: Multi-entity sentiment scoring. In: RANLP (2009)

    Google Scholar 

  25. Ng, V., Cardie, C.: Improving machine learning approaches to coreference resolution. In: ACL (2002)

    Google Scholar 

  26. NIST Speech Group. The ace 2006 evaluation plan: evaluation of the detection and recognition of ace entities, values, temporal expressions, relations, and events (2006)

    Google Scholar 

  27. Nowson, S.: Scary movies good, scary flights bad: topic driven feature selection for classification of sentiment. In: TSA ( 2009)

    Google Scholar 

  28. Ogren, P.V.: Knowtator: a protégé plug-in for annotated corpus construction. In: NAACL-HLT (2006)

    Google Scholar 

  29. Polanyi, L., Zaenen, A.: Contextual valence shifters. In: Computing attitude and affect in text: theory and applications (2006)

    Google Scholar 

  30. Pustejovsky, J., Hanks, P., Sauri, R., See, A., Gaizauskas, R., Setzer, A., Radev, D., Sundheim, B., Day, D., Ferro, L., Lazo, M.: The timebank corpus. In: Corpus Linguistics (2003)

    Google Scholar 

  31. Rubin, V.L.: Stating with certainty or stating with doubt: intercoder reliability results for manual annotation of epistemically modalized statements. In: NAACL-HLT (2007)

    Google Scholar 

  32. Ruppenhofer, J., Somasundaran, S., Wiebe, J.: Finding the sources and targets of subjective expressions. In: LREC (2008)

    Google Scholar 

  33. Shaikh, M.A.M., Prendinger, H., Ishizuka, M.: Sentiment assessment of text by analyzing linguistic features and contextual valence assignment. Appl. Artif. Intell. 22(6), 558–601 (2008)

    Article  Google Scholar 

  34. Su, F., Markert, K.: From words to senses: a case study of subjectivity recognition. In: COLING (2008)

    Google Scholar 

  35. Tsur, O., Davidov, D., Rappoport, A.: Icwsm - a great catchy name: semi-supervised recognition of sarcastic sentences in product reviews. In: ICWSM (2010)

    Google Scholar 

  36. Vaswani, V.: Predicting sentiment-mention associations in product reviews Ph.D. Dissertation, Kansas State University (2012)

    Google Scholar 

  37. Wiebe, J., Mihalcea, R.: Word sense and subjectivity. In: ACL (2006)

    Google Scholar 

  38. Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. In: LREC (2005)

    Google Scholar 

  39. Wiegand, M., Klakow, D.: Topic-related polarity classification of blog sentences. In: EPIA (2009)

    Google Scholar 

  40. Wilson, T., Wiebe, J.: Annotating opinions in the world press. In: SIGdial (2003)

    Google Scholar 

  41. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity in phrase-level sentiment analysis. In: HLT-EMNLP (2005)

    Google Scholar 

  42. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: an exploration of features for phrase-level sentiment analysis. Comput. Linguist. 35(3), 399–433 (2009)

    Article  Google Scholar 

  43. Wilson, T.A.: Fine-grained subjectivity and sentiment analysis: recognizing the intensity, polarity, and attitudes of private States. Ph.D. Dissertation, University of Pittsburgh (2008)

    Google Scholar 

  44. Winston, M.E., Chaffin, R., Herrmann, D.: A taxonomy of part-whole relations. Cognit. Sci. 11(4), 417–444 (1987)

    Article  Google Scholar 

  45. Yu, N., Kübler, S.: Filling the gap: semi-supervised learning for opinion detection across domains. In: Proceedings of the Fifteenth Conference on Computational Natural Language Learning, pp. 200–209. Association for Computational Linguistics (2011)

    Google Scholar 

Download references

Acknowledgements

We would like to thank Prof. Martha Palmer, Prof. James Martin, Prof. Michael Mozer at University of Colorado, and Prof. Michael Gasser at Indiana University and Dr. William Headden at J.D. Power and Associates for their helpful discussions. Dr. Miriam Eckert and Lyndsie Clark assisted with an earlier iteration of the corpus description [20].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jason S. Kessler .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Kessler, J.S., Nicolov, N. (2017). The JDPA Sentiment Corpus for the Automotive Domain. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_30

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_30

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics