Advertisement

SABINE: A Multi-purpose Dataset of Semantically-Annotated Social Content

  • Silvana Castano
  • Alfio Ferrara
  • Enrico Gallinucci
  • Matteo GolfarelliEmail author
  • Stefano Montanelli
  • Lorenzo Mosca
  • Stefano Rizzi
  • Cristian Vaccari
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11137)

Abstract

Social Business Intelligence (SBI) is the discipline that combines corporate data with social content to let decision makers analyze the trends perceived from the environment. SBI poses research challenges in several areas, such as IR, data mining, and NLP; unfortunately, SBI research is often restrained by the lack of publicly-available, real-world data for experimenting approaches, and by the difficulties in determining a ground truth. To fill this gap we present SABINE, a modular dataset in the domain of European politics. SABINE includes 6 millions bilingual clips crawled from 50 000 web sources, each associated with metadata and sentiment scores; an ontology with 400 topics, their occurrences in the clips, and their mapping to DBpedia; two multidimensional cubes for analyzing and aggregating sentiment and semantic occurrences. We also propose a set of research challenges that can be addressed using SABINE; remarkably, the presence of an expert-validated ground truth ensures the possibility of testing approaches to the whole SBI process as well as to each single task.

Keywords

Dataset Social technologies Sentiment analysis Text analysis 

References

  1. 1.
    Achichi, M., et al.: Results of the ontology alignment evaluation initiative 2016. In: CEUR Workshop Proceedings, vol. 1766, pp. 73–129. RWTH (2016)Google Scholar
  2. 2.
    Amini, M., Usunier, N., Goutte, C.: Learning from multiple partially observed views-an application to multilingual text categorization. In: Advances in Neural Information Processing Systems, pp. 28–36 (2009)Google Scholar
  3. 3.
    Castano, S., Ferrara, A., Montanelli, S.: Matching ontologies in open networked systems: techniques and applications. In: Spaccapietra, S., Atzeni, P., Chu, W.W., Catarci, T., Sycara, K.P. (eds.) Journal on Data Semantics V. LNCS, vol. 3870, pp. 25–63. Springer, Heidelberg (2006).  https://doi.org/10.1007/11617808_2CrossRefGoogle Scholar
  4. 4.
    Castano, S., et al.: SABINE: a multi-purpose dataset of semantically-annotated social content. In: Vrandečić, et al. (eds.) ISWC 2018, Part II. LNCS, vol. 11137, pp. 70–85 (2018). http://purl.org/sabineGoogle Scholar
  5. 5.
    Castano, S., Ferrara, A., Genta, L., Montanelli, S.: Combining crowd consensus and user trustworthiness for managing collective tasks. Futur. Gener. Comput. Syst. 54, 378–388 (2016)CrossRefGoogle Scholar
  6. 6.
    Chu, C., Nakazawa, T., Kurohashi, S.: Iterative bilingual lexicon extraction from comparable corpora with topical and contextual knowledge. In: Gelbukh, A. (ed.) CICLing 2014. LNCS, vol. 8404, pp. 296–309. Springer, Heidelberg (2014).  https://doi.org/10.1007/978-3-642-54903-8_25CrossRefGoogle Scholar
  7. 7.
    Francia, M., Gallinucci, E., Golfarelli, M., Rizzi, S.: Social business intelligence in action. In: Nurcan, S., Soffer, P., Bajec, M., Eder, J. (eds.) CAiSE 2016. LNCS, vol. 9694, pp. 33–48. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-39696-5_3CrossRefGoogle Scholar
  8. 8.
    Francia, M., Golfarelli, M., Rizzi, S.: A methodology for social BI. In: Proceedings of IDEAS, pp. 207–216 (2014)Google Scholar
  9. 9.
    Gallinucci, E., Golfarelli, M., Rizzi, S.: Advanced topic modeling for social business intelligence. Inf. Syst. 53, 87–106 (2015)CrossRefGoogle Scholar
  10. 10.
    Golfarelli, M., Rizzi, S.: Data Warehouse Design: Modern Principles and Methodologies. McGraw-Hill, New York City (2009)Google Scholar
  11. 11.
    Hu, M., Liu, B.: Mining and summarizing customer reviews. In: Proceedings of the SIGKDD, Seattle, USA, pp. 168–177 (2004)Google Scholar
  12. 12.
    Liu, B., Zhang, L.: A Survey of Opinion Mining and Sentiment Analysis. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 415–463. Springer, Boston (2012).  https://doi.org/10.1007/978-1-4614-3223-4_13CrossRefGoogle Scholar
  13. 13.
    McCord, M.C.: Slot grammar. In: Studer, R. (ed.) Natural Language and Logic. LNCS, vol. 459, pp. 118–145. Springer, Heidelberg (1990).  https://doi.org/10.1007/3-540-53082-7_20CrossRefGoogle Scholar
  14. 14.
    Nakov, P., Ritter, A., Rosenthal, S., Sebastiani, F., Stoyanov, V.: SemEval-2016 task 4: sentiment analysis in twitter. In: Proceedings of SemEval@NAACL-HLT, San Diego, USA, pp. 1–18 (2016)Google Scholar
  15. 15.
    Neri, F., Aliprandi, C., Capeci, F., Cuadros, M., By, T.: Sentiment analysis on social media. In: Proceedings of ASONAM, pp. 919–926 (2012)Google Scholar
  16. 16.
    Otero, P.G.: Learning bilingual lexicons from comparable English and Spanish corpora. In: MT Summit xI, pp. 191–198 (2007)Google Scholar
  17. 17.
    Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In: Proceedings of ACL, Michigan, USA, pp. 115–124 (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Silvana Castano
    • 1
  • Alfio Ferrara
    • 1
  • Enrico Gallinucci
    • 2
  • Matteo Golfarelli
    • 2
    Email author
  • Stefano Montanelli
    • 1
  • Lorenzo Mosca
    • 3
  • Stefano Rizzi
    • 2
  • Cristian Vaccari
    • 4
    • 5
  1. 1.DIUniversity of MilanMilanItaly
  2. 2.DISIUniversity of BolognaBolognaItaly
  3. 3.DFCSUniversity of Roma TreRomeItaly
  4. 4.Royal Holloway University of LondonEghamUK
  5. 5.University of BolognaBolognaItaly

Personalised recommendations